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INTERCORRELATIONS AND FACTOR ANAL- 
YSIS OF TESTS GIVEN TO TEACH- 
ING CANDIDATES 


J. C. GOWAN 
University of California at Los Angeles 


THE STUDY reported below grew out of a 
larger program of assessment and evaluation of 
teaching candidates reported elsewhere (7), sup- 
plemented by smaller samples from other teach- 
er training institutions. It is the purpose of the 
present report to point up certain interrelation- 
ships between testing instruments which may 
have significance beyond the context in which 
the tests were given. 


Sample One 
1. Scope of Study 


This study gives intercorrelations and result- 
ant factor analyses from a matrix of order 62 
produced by the scores of teaching candidates on 
a battery of tests. In addition tothelarge num- 
ber of scales making up the matrix, the study is 
of interest because of the sizeable numbers of 
subjects involved. These subjects were junior, 
senior and graduate students at the University of 
California, Los Angeles. The tests adm inis- 
tered were part of a required series given by 
the Teacher Selection and Counseling Service of 
the School of Education. This agency processes 
about 1400 cases per year. 

It is not the purpose of this paper either to 
explore the literature, or engage in discussion 
about criteria of teaching success, or develop 
teacher prognosis scales. These matters are 
handled elsewhere (1,4,5,6,9,10). The design 
is simply to detail as briefly as possible the 
rather extensive intercorrelations and the factor 
analysis results accruing therefrom. 


2. Tests Used and Method of Procedure 


The tests used in this study were the Cooper- 


ative English Test, the Stanford Arithmetic Test, 
the American Council Psychological Examination, 
the Minnesota Multiphasic Personality Inventory, 
the California Psychological Inventory, the Re- 
vised Study of Values (Allport etal.), the Guitford- 
Zimmerman Temperament Survey, and two new 
scales on the MMPI alleged to predict teaching 
success or failure. The teaching scales were 
the plus and minus sections of a scale devised by 
the authors and detailed more fully elsewhere (5). 
The three thousand odd correlation coefficients 
needed for the study were obtained from IBM 
cards by a method outlined by J. C. Flanagan (2). 
Briefly, this method consists of determining what 
percent of the top 27 percent criterion group on 
the independent variable lie above the median 
score on the dependent variable, and what percent 
of the bottom 27 percent criterion group on the in- 
dependent variable behave in similar fashion. 
The names of the scales of the various tests, 
their code symbols, and the variable numbers 
are detailed in Table I. It will be observed that 
the scale codes are grouped so that all scales for 
a certain test have a common initial letter. Care 
has been taken so that the scale numbers cor re- 
spond (as in the MMPI) to commonly accepted 
usage. Validating scales have small letters. A 
few of the scales were not used throughout the 
study. For example, D5, the Masculinity-Fem- 
ininity scale of the MMPI, was not used because 
the scoring differs for men and women. No at- 
tempt was made to correlate scales Dll, D12, 
D13 or D14 with any of the later scales, since 
Gough used these scales to develop corresponding 
scales (E5, El, E4, E10) on the Psychological In- 
ventory. The intercorrelations of the A gruup 
with groups beyond D are also missing, since a 
factor analysis showed that practically all the 
variance of this group was being cared for by the 


*Now of Los Angeles State College. The authors are indebted to the University of California for a re- 
search grant covering in part the expense of this study. Further acknowledgments are due to Dr. Har- 
rison Gough for permission to use the California Psychological Inventory and for scoring it; to Mr.Rob- 
ert Rutz and the IBM room personnel of the UCLA Controller’s Office for running the card sorts; to the 
staff of the School of Education Teacher Selection and Counseling Service for cooperation and assistance; 
to Gordon Fifer, Fred Machetanz and Enid Janssen who assisted in the computation, and to Dorothy 


Kern who drew the figures. 
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TABLE I 


NAMES AND CODE SYMBOLS FOR SCALES USED IN INTERCORRELATION STUDY I 


Name of Scale 


Code 


Name of Scale 


Cooperative and Stanford 
Cooperative English Vocabulary 
Cooperative English Speed 
Cooperative English Level 
Cooperative English Mechanics 
Cooperative English Effectiveness 
Stanford Arithmetic (Part 1) 


American Council Psychological 
Q: Quantitative 

L: Linguistic 

= Total 


Allport Study of Values 
Theoretical 
Economic 
Aesthetic 

Social 

Political 


California Psychological Inventory 


Infrequency 

Good Impression 
Dissimulation 

Social Responsibility 
Tolerance 

Flexibility 

Status 

Dominance 

Social Participation 
Femininity 
Delinquency 
Intellectual Efficiency 
Academic Achievement 
Honor Point Ratio. 
Psychologist’s Interest 
Neurodermatitis 
Poise, Self-confidence 
Impulsivity 


Religious 

Guilford- Zimmerman Temperament 
General Energy 
Restraint 
Ascendance 
Sociability 
Emotional Stability 
Objectivity 
Friendliness 
Thoughtfulness 
Perscnal Relations 
Masculinity 


innesota Multiphasic Inventory Fl 
Lie F2 
Falsification F3 
Suppressor Variable F4 
Hypochondriasis F5 
Depression F6 
Hysteri F7 
Psychopathic Deviate F8 
Paranoia F9 
Psychasthenia F10 
Schizophrenia 

Hypomamia G Teacher Prognosis Scales (MMPI) 
Social Introversion Gl Tp: Teacher Positive 
Dominance G2 Tn: Teacher Negative 

Social Responsibility 

Status 

Academic Achievement 


= 
E: 
A: 
S: 
R: 
M 
L: 
F: 
K: 


= 
WO 


2 
Code | 
A E 
Al Ea 
A2 Gi: 
A3 Ds: | 
: A4 El Re: 
A5 E2 To: 
A6 E3 Fi: 
E4 
B E5 Do: 
BI E6 Sp: 
B2 ET Fe: 
B3 E8 De: 
E9 Ie: 
Cc E10 Ac: 
Cl Ell Hr: 
C2 El2 Py: 
a C3 E13 Ne: 
c4 E14 
C5 E15 X2: 
C6 
D 
Da 
Db 
De 
Di 
D2 
D3 
; D4 
D6 
D7 | 
D8 S&S 
D9 Mi 
D10 Si: 
Dil De 
D12 Re 
| D13 St: | 
Ac 


B group (ACE scores). 

It was not considered feasible to perform a 
factor analysis of the entire matrix, so various 
minors were selected for the purpose. In order 
to present the matrix in form which can be ac- 
commodated on paper of standard size, it was 
split up into various minors. A schematic pre- 
sentation of the matrix with respect to this sec- 
tioning is shown in Tables andIIIl. Table II 
shows the number of students involved in each 
section, from which it will be seen that far fewer 
cases were covered in the last three groups. Ta- 
ble III indicates what parts of the master matrix 
are displayed in future tables indicating specific 
minors of the determinant. 

Tables IV to VII inclusive present various 
minors of this matrix. Tables VIII to XI, inclu- 
sive, present factor analyses resulting from this 
material. The remaining tables in the paper 
concern different samples of a much smaller 
magnitude. 


3. Results and Discussion 


The results so far as intercorrelations inthe 
Tables IV to VII are concerned, speak for them- 
selves. It is not considered feasible to discuss 
all the implications raised. Such material, how- 
ever, may be valuable to investigators other than 
those in education, and are presented with this 
in mind. 

An interesting empirical check on the stand- 
ard error of measurement for the short cut meth- 
od of obtaining correlation coefficients appears 
worthy of comment. The method utilized (Flan- 
agan’s tails method) results in an ajj which may 
be different from ajj. In Minor I (Table IV) the 
difference (ajj - aji) was computed for the 870 
coefficients in the 30 x 30 matrix. After elim- 
inating and correcting errors signaled by high 
differences, the following distribution was ob- 
tained: 


Difference 
Interval Frequency 
20 to 22 1 
17 to 19 1 
14to 16 5 
llto 13 5 
8to 10 12 
5to 7 38 N = 435 
2to 4 76 M = 0. 06 
-lto 1 141 8D. = 4.92 
- 4to- 2 94 
- Tto- 5 41 
-10 to - 8 16 
-13 to -11 4 
-16 to -14 1 


The mean of this distribution was gratifyingly 
near zero; the standard deviation was 4.92. The 
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standard error of an ‘‘r’’ of zero for an N of ' 
1700 is 2.42 by the usual formula for standard er- 
ror of Pearsonian ‘‘r’’. In practice the differ- 
ences were averaged. It may be noted that Flan- 
agan (2:347) gives an approximation for what 
amounts to the standard error of a correlation co- 
efficient obtained in the above manner. In the 
case, ‘‘r’’ equals 0, it is 1.3 times the standard 
error for the Pearsonian coefficient; for an ‘‘r’’ 
of .45, itis 1.5 times as much. The maximum 
standard error values given in the tables are for 
Pearsonian ‘‘r’s’’. These facts should be taken 
into account in interpreting the tables. 


4. Factor Analysis 


Factor analysis of various selected minors of 
the material in Sample One was done by centroid 
methods outlined by Thurstone (11). Results for 
a 30 x 30 matrix of variables 1 through 30 are 
shown in Table VIII. This matrix consisted of 
variables of the Cooperative English Test, Amer- 
ican Council Psychological Examination, Allport 
Study of Values, and Minnesota Multiphasic Inven- 
tory. Six factors were extracted. These were 
sufficient to account for more than half the vari- 
ance except on the following scales: all scales of 
the Allport; and Lie, Falsification, Psychopath- 
ic Deviate, Paranoia, Hypomania and Status of 
the MMPI. The factors were left unrotated, at 
least in this initial study, since some of the fac- 
tors (such as factor I, which obviously repre- 
sents general intelligence) had considerable psy- 
chological significance as they stood. Some ro- 
tation was attempted later, as shown in Table XI. 
The unrotated factors of factor analysis 1 were 
designated as follows: 


Factor I, with its very high loading on the A. 
C.E. (practically to the reliability coefficient) 
and with somewhat less high loadings on the C o- 
operative English Test, was named “‘Intelligence’”? 
\As can be seen, it is considerably more verbal 
than numerical. The only other loading above .20 
on this factor is positive with Theoretical and 
Dominance, and negative with Lie scales. This 
one factor takes out so much of the variance of 
the first nine variables that only in Factor V is 
there found a single loading of .15 or more. The 
A.C.E. Total effectively speaks for the other 
variables. 

Factor II is rather well defined as ‘‘K’’, the 
somewhat mysterious suppressor variable onthe 
MMPI. Because of the fact that either fractions 
or the full amount of the ‘‘K’’ value is added to 
the Hs, Pd, Pt, Sc and Ma scales, care should 
be used in concluding that correlations with these 
scales fix the description of ‘‘K’’. Of the ‘‘un- 
contaminated scales’’ the order of factor load- 
ings is as follows: Hysteria, Responsibility, 
Paranoia, Lie, Dominance and Status. Moder- 
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TABLE I 


SCHEMATIC PRESENTATION OF THE MATRIX SHOWING MINORS EXHIBITED 


Code and Test 


Cooperative 
English and Minor I 
Stanford Table IV 


American 
Council 
Psychological 


Allport- 
Vernon Study 
of Values 


Minnesota 
Multiphasic 
Inventory 


California 
Psychological 
Inventory 


Guilford - Minor IV 
Zimmerman Table VI 
Temperament 


Teacher 
Prognosis 
Scales (MMPI) 
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Cc Minor 

E Minor Ill 

F 
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TABLE IV 
(Minor I) 


INTERCORRELATIONS BETWEEN COOPERATIVE ENGLISH, AMERICAN COUNCIL PSYCHOLOGICAL, 
ALLPORT STUDY OF VALUES AND MINNESOTA MULTIPHASIC INVENTORY* 


Cooperative et al Allport Values 
Al A2 A4 AS Bi B2 B3 C3 C4 
Voc Sp Lev M Eff A Ss 


67 53.55 
-- 53.62 
46 56 
-- 60 


D10 
Dil 
D12 


*Decimals omitted throughout 


~ 
6 
Al -- 29 19 78 65 22 -26 26 -14 
A2 42 48 78 77° 16 -21 22 O00 -05 -14 
A3 34 «65s -20 46-05-14 
A4 37 36 «656 «656 -15 17 12 -13 -06 
AS -- 37 39 61 61 10 -11 14 00 -05 -08 
A6 -- 68 44 61 17 07 +-16 00 
Bl -- 48 78 09 03 -04 -02 O01 04 
. B2 -- 92 23 -24 22 00 -06 -15 
B3 -- 20 -16 OL -04 -14 = -20 
Cl -- -12 -06 -12 -02 -46 
C2 -- -42 -28 20 -18 
C3 -- -13  -25 -30 
c4 -- -24 -04 
C5 -- -30 
C6 
Da 
Db 
De 
D1 
D2 
D3 
D4 
D6 
D7 
D8 
D9 


Hy 


Pd 


Pa 


Pt 


Sc 


Ma 


Si 


Do 


Re 


St 


-04 
-07 
-05 
-07 
-02 
-04 


-08 
-06 
-09 


-04 
-11 
02 
08 
-07 
09 


17 
ll 
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q 
Minnesota Multiphasic Inventor 
Da 
L F K Hs D = = = 
-16 00 -09 03 -04 01 -05 03-10 06 23 07 14 
-23 -04 00 -08 -08 -i0 -07 -06 04 = -05 03 17 05 13 
-18 -04 02 -08 -06 -09 -05 -07 01 = -10 04 22 07 10 
-06 -07 06 -06 -04 -02 -05 -02 00 07 15 13 13 
-10 -05 04 -04 -07 00 -04 -06 -02 -07 03 13 04 06 
08 -04 -08 -06 -03 10 02 02 
-07 04 -04 -14 -07 -08 -10 -05 -O1 -05 08 01 09 
-20 -06 02 -10 -02 -08 # -04 03 -04 00 24 14 18 
| -02 01 -10 -08 -08 -08 # -05 -08 -01 -01 00 20 04 16 
-09 14 -02 -04 10 -03 08 -06 01 -03 04 13 04 10 
| 00 15 -09 -02 14 04 00 08 10 00 08 08 04 14 
| 09 = -02 09 05 06 06 08 06 10 + -02 01 05 10 04 
-08 00 -03 -06 -07 -06 00 -08 -07 09 -12 11 -10 08 
06 07 -12 02 06 -04 -02 -02 -23 10 -23 
- << 46 24 03 33 11 | 00 00 -16 -12 03 47 08 
09 28 06 18 23 30 12 24 -05 -18 -09 
ee 57 -10 51 39 23 18 47 -16 -44 31 52 22 
a 31 70 40 28 48 50 -02 -03 -03 12 + -01 
22 32 19 52 26 8-22 -12 
45 38 36 40 -06  -20 10 25 10 
ee 31 42 52 1 80 04 07 06 
oo 36 36 -02 -05 06 10 00 
64 08 20 -28 -08 
«230 -04 -28 10 
39 52 
2 oo 19 
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TABLE VII 
(Factor Analysis I) 


UNROTATED FACTOR LOADINGS FOR MATRIX OF VARIABLES 1 THROUGH 30* 


Independent Variable h? 


Cooperative English Vocabulary 

Cooperative English Speed 85 
Cooperative English Level 76 
Cooperative English Mechanics 69 
Cooperative English Effectiveness 73 
Stanford Arithmetic 63 


Quantitative 66 
Linguistic 89 82 
Total 93 88 


Theoretical 20 -49 30 
Economic -18 -19 42 
Aesthetic 17 09 -16 34 
Social 02 02 22 07 
Political -07 -13 -38 04 30 
Religious -13 01 66 06 47 


Lie -20 -14 19 -40 43 
Falsification -06 38 8 -35 06 29 
Suppressor Variable 04 -18 19 -04 74 
Hypochonriasis -09 34 -04 58 
Depression -08 63 -15 -32 56 
Hysteria -08 17 + 04 -06 60 
Psychopathic Deviate -05 32-12 20 «47 
Paranoia -08 19 05 02 24 
Psychasthenia -08 67 03 11 62 
Schizophrenia 00 49 -07 
Hypomania -07 00 -12 57 35 
Social Introversion 02 56 00 -35 57 
Dominance 22 33 -51 -31 -04 53 
Social Responsibility 08 47 -33 14 -39 §2 
Status 15 32 -49 -29 13 10 48 


D10 
Dil 
D12 
D13 


Q: 
L: 
T: 
T: 
E: 
A: 
S: 
P: 
R: 
L: 
F: 
K: 
Hs: 
D: 
Hy: 
Pd: 
Pa: 
Pt: 
Se: 
Ma: 
Si: 
Do: 
Re: 
St: 


*Decimals are omitted throughout. For further identification of variables, see Table I. 
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Al 02 03 -06 40 OO 73 ; 
A2 -02 O1 -03 13 O6 74 
A3 -01 OO -05 19 00 62 
A4 03 00 14 13 -06 52 
A5 01 O01 05 O58 -O1 54 
A6 -01 00 04 -42 -02 58 
Bl 
B3 
Cl 
C2 
C3 
C4 
C6 
Da 
Db 
De 
Di 
D2 
D3 
D4 
D6 
D7 
D8 
D9 
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ate amounts of ‘‘K’’ may be desirable for inte- 
grated ego functioning. The factor is designated 
as ego-sensitivity. 

Factor III is pretty well described asa ‘‘hope- 
lessness’’ indicator. It has its highest loadings 
on D, Pt and Si. There is social withdrawal 
here also, as indicated by the positive 1 oadings 
on Sc and Si, and the negative loading on Do. 

Factor IV is an Allport factor designated here 
as ‘‘mystical’’ because of its high religious load- 
ing, but also because of the negative loadings on 
theoretical and dominance. Notice that persons 
described by this factor do not withdraw although 
they may not seek to lead, and that they do not 
falsify although they may tend to be subjective. 

Factor V represents the pole of verbal-artis- 
tic versus mathematical-practical. The guess 
is hazarded that this factor would show consider- 
able sex difference. The largest variances 
come on the Allport, but there are important 
loadings on Vocabulary and ‘‘Q’’. 

Factor VI seems to be less well defined than 
the others. It appears to represent manic ten- 
dencies, especially those connected with not fol- 
lowing through on a job. It is designated as 
**manic irresponsibility’’. 


Factor Analysis II is detailed in Table IX 
From a matrix of 20 variables selected from the 
California Psychological Inventory and the Guil- 
ford-Zimmerman Temperament Survey, three 
factors were extracted into oblique simple struc - 
ture. The rotated factors might be designated 
as: I, General Teaching Adjustment; Il, Thought- 
fulness or anti-delinquency; and III, General En- 
ergy. The rather considerable loadings on Fac- 
tor I round out the description of what teaching 
adjustment as measured by the Teacher Progno- 
sis Scale Gi represents. 

Factor Analysis III is an attempt to combine 
some of the leading variables of IandII. Table 
X details the loadings of the unrotated and ro- 
tated three factors extracted from ten selected 
variables. Again Factor I seems to be General 
Teaching Adjustment, Factor II General Energy, 
and Factor III related to Status, Poise or Flex- 
ibility. Reversing Factors II and III between 
this and the last factor analysis, it appears that 
the same factor space is described. 

In Table XI, there is a short but interesting 
digression with regard to Factor Analysis I. 
The first three variables, which account for 
most of the variance, have been normalized and 
plotted on a sphere, the positive hemisphere of 
which is shown in Figure 1. It appears that while 
the A.C.E. variable is at one pole of the sphere, 
the other two poles represent something like ‘‘K’’ 
and anti-depression. The MMPI scores then 
seem to array themselves on a band ornarrow 
width, nearly in the plane of the equator (or great 
circle) of the A.C.E. pole. They can, thus, be 


expressed by a single parameter angle, represent- 
ing their deviation from an ideal pole “‘anti-de- 
pression’’. Rotation, hence, seems unnecessary. 
The concept of the different scales of the MMPI 
being spread out in this fashion introduces some 
very interesting speculations and possibilities 
which only further research can verify or dis- 
prove. These facts should be considered, of 
course, in the light of the amount of variance that 
any particular scale has with respectto these 
three factors. The fact that paranoia and psycho- 
pathic deviate are nearly together does not indi- 
cate of itself that they represent the same thing, 
since only 23 percent of the variance of the first 
and 41 percent of the variance of the second is 
involved. Nevertheless, the relationships be- 
tween the MMPI scales are certainly rather 
graphically revealed in at least some of their di- 
mensions by Figures 1 and 2. 


Samples Two and Three 


1. Introduction 


Samples Two and Three consist of much smal- 
ler populations and far fewer test scales. They 
are included chiefly because of the confirmation 
of the resulting factor analyses with some of the 
factor space and location of the vectors of the pre- 
vious analyses. It is considered significant that 
different factor analyses done with different tests 
and on different populations should turn up simi- 
lar results. 

The populations used in these samples were 
Education juniors at Los Angeles State College. 
The number of cases included in Sample Two was 
110 and in Sample Three 86. There was no over- 
lapping of personnel between the samples. The 
IBM equipment was not used to secure the inter- 
correlations, but the method of Flanagan, previ- 
ously mentioned, was employed. 


2. Description of the Variables of Sample Two 


Eight variables were used in Sample Two: 1) 
a rating on authoritarianism, using a modified 
Adorno ‘‘F’’ scale, 2) a rating on-socio-econ- 
omic status of parents of the type used by Sims 
(ratings were made by respondents t he mselves), 
3) a sample of the Tp scale containing about half 
the items, 4) a sample of the Sc scale containing 
about half the items, 5) a sample of the Tnscale 
containing about half the items, 6) a sample of 
the D scale containing about half the items, 7) the 
Minnesota Teacher Attitude Inventory score, and 
8) intelligence as measured by the Army General 
Classification Test. Intercorrelations of these 
variables are shown in Table XII. 


3. Results of the Factor Analysis 
Three factors were extracted. Table XIII 


TABLE IX 
(Factor Analysis II) 


FACTOR LOADINGS FOR MATRIX OF CERTAIN VARIABLES IN E, F, G SERIES* 


Unrotated Rotated 


Independent Variable h? I ul 


Gi: Good Impression 72 76 -50 
Dissimulation 66 -87 -05 
Social Responsibility -18 41 30 70 
Flexibility 06 33 30 
Status 44 55 78 #8658 
Dominance 50 52 68 20 
Social Participation 53 68 75 26 
Delinquency 32. -36 -93 
Intellectual Efficiency 29 60 90 833 
Academic Achievement -12 42 04 -35 
Poise, Self-confidence 74 32 #80 40 84 
Impulsivity 74 +11 80 -47 07 


General Energy 34-27 20 25 05 
Ascendance 39-11 38 20 
Stability -31 -09 50 89 -46 
Friendliness -45 45 £70 65 -03 
Thoughtfulness 26 -37 21 10 +77 
Personal Relations -27 26 50 85 -06 


Teacher Positive 72 -01 -09 52 99 -0O1 
Teacher Negative -63 10 O08 40 -98 05 


Gl 
G2 


Ds: 
Re: 
Fl: 
St: 
Do: 
Sp: 
De: 
Te: 
Ac: 
X1: 
X2: 
G: 
A: 
E: 
F: 
T: 
P: 
Tp: 
Tn: 


* Decimals omitted throughout. For further identification of the variables, see Table I. 
**Correlations between the rotated factors: I and II, .00; I and III, .00; II and III, . 50. 
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Eb -31 
Ec 46 
El -83 
E3 -83 
E4 00 
E5 60 
E6 45 
E8 -30 
E9 00 
E10 -10 
E14 00 
E15 30 
Fl 88 
F3 42 
F5 -10 
Fi -75 
F8 40 
F9 -47 
. 00 
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TABLE X 
(Factor Analysis III) 


FACTOR LOADINGS FOR MATRIX OF TEN SELECTED VARIABLES* 


Unrotated Rotated** 
Independent Variable I mh? I 
B3 American Council Psychological Total 15 17 -04 £05 65 10 £60 
De_iK: Suppressor Variable on MMPI 74 -33 -06 £66 75 -55 . 10 
D7 Pt: Psychasthenia on MMPI -07 -36 42 £31 -67 -98 10 
Eb Gi: Good Impression on CPI 60 -47 -39 73 75 -30 -25 
E3 Fl: Flexibility on CPI 51 -04 57 58 02 -92 £72 
E4 St: Status on CPI 61 42 14 56 55 -30 82 / 
E14 Xl: Poise, Self-confidence on CPI 50 68 12 £72 42 -05 80 } 
Fi G: General Energy on G-Z -01 41 -35 29 35 85 00 
FT SOF: Friendliness 66 -38 -08 58 73 -60 05 
Gl Tp: Teacher Positive on MMPI 67 06 -43 #63 99 -0Ol 01 


* Decimals are omittedthroughout. For further identification of variable, see Table I. 
**Correlations between the rotated factors: I and II, .00; I and IM, .00; II and III, .50. 


TABLE XI 
(Detail of Factor Analysis I) 


NORMALIZATION OF FIRST THREE FACTORS FOR CERTAIN VARIABLES 
OF FACTOR ANALYSIS I* 


Independent Variable I F I Ii Beta** 
B3 American Council Psychological Total 93 -02 -01 86 99 -02 -01 

De K: Suppressor Variable on MMPI 04 80 18—~ 67 05 98 ##=$22 80° 
Dl Hs: Hypochondriasis -09 66 34 55 -12 89 45 115° 
D2. OCD: Depression -08 13 63 42 -12 20 97 168° 
D3 Hy: Hysteria -08 74 17 ~~ 57 -10 98 22 100° 
D4 Pd: Psychopathic Deviate -05 56 32 41 -08 88 50 120° 
D6 Pa: Paranoia -08 44 19 23 -16 40 113° 
D7 Pt: Psychastenia -08 39 67 60 -10 50 86 150° 
D8 Sc: Schizophrenia 00 60 49 60 00 77 #424363 130° 
D10 Si: Social Introversion 02 -35 +56 43 03 -54 86 210° 
Dil Do: Dominance 22 33 -51 41 34 52 -80 33° 
Di2 Re: Social Responsibility 08 47 -33 33 14 82 -58 58° 
D13 St: Status 15 32 -49 36 25 53-82 32° 


* Decimals are omitted throughout. For further identification of the variable, see Table I. 

**If the normalized loadings for the three factors are represented by ajj, 42), 43j, then if (aj ;° 
= 0, approximately (or is less than .1), ag; = sin B, and a3j = cos (B - 1890 
B is the parameter angle along which the MM 
and 2.) 


). In other woras, 
PI scales seem to be distributed. (See Figures 1 
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TABLE XI 
INTE RCORRELATIONS OF VARIABLES OF SAMPLE TWO (LOS ANGELES STATE COLLEGE)* 


Code and Scale Il Gl D6 G2 D2 


Hi  Authoritarianism from Modified Adorno Scale -05 -27 37 15 10 -41 
Il Socio-economic Status of Parents (Sims type) 20 -20 -28 -15 -14 
Gl Tp: Teacher Positive (MMPI)** -65 -58 -22 33 
D8 Se: Schizophrenia (MMPI)** 60 31 -24 
G2 Tn: Teacher Negative (MMPI)** 34 -42 
D2 OD: Depression (MMPI)** -30 
Ji Minnesota Teacher Attitude Inventory (MTAI) 54 
K1 Intelligence: Army General Classification Test 


* Decimals are omitted throughout. For identification of the scales, consult Table I and introductory 
context to Table XII 


**Represents only a sample of the scale named. 


TABLE XIII 
(Factor Analysis IV) 


FACTOR LOADINGS FOR MATRIX OF TABLE XII (LOS ANGELES STATE COLLEGE)* 


‘Unrotated Rotated 


Independent Variabie 


Hl  Authoritarianism (Modified Adorno) 79 -17 71 

ll Socio-economic Status of Parents 16 53 42 -10 
Gl Tp: Teacher Positive (MMPI) 13 -08 #51 25 
D8 Sec: Schizophrenia (MMPI) 70 -63 30 98 -74 
G2 Tn: Teacher Negative (MMPI) 79 -09 00 63 20 
D2 D: ODepression(MMPI) . 53-03 -03 28 -05 
Jl Minnesota Teacher Attitude Inventory -65 -50 -06 67 80 -37 
K1 Intelligence: (AGCT) 58 -40 -49 £74 65 -10 


* Decimals are omitted throughout. For further identification of variables refer to Table I or 
Table XII. 
**Correlations between the rotated factors: I and II, -.09; I and II, -.09; Il and I, .03. 
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TABLE XIV 
INTE RCORRE LATIONS OF VARIABLES OF SAMPLE THREE (LOS ANGELES STATE COLLEGE)* 


Code and Scale G2 Hi 


Bell Adjustment: Home -46 -13 
Bell Adjustment: Health -51 07 
Bell Adjustment: Social -43 -24 
Bell Adjustment: Emotional -37 -32 
Bell Adjustment: Vocational -35 
RAPH: Rigidity 25 30 
MTAI: Teacher Attitude -47 -30 
Tp: Teacher Positive (MMPI) -47 -52 
Tn: Teacher Negative (MMPI) 42 


*Decimals are omitted throughout. For identification of the scales, consult Table I and introductory 
context to Table XIV. Maximum standard error of R is .11. 


TABLE XV 
(Factor Analysis V) 


FACTOR LOADINGS FOR MATRIX OF TABLE XIV (LOS ANGELES STATE COLLEGE)* 


Unrotated Rotated 


Independent Variable \ I 0 Wm wh? 


Bell Adjustment: Home 65 -36 11 56 
Bell Adjustment: Health 53 -53 42 £474 
Beil Adjustment: Social 60 -41 -35 65 
Bell Adjustment: Emotional 77 -40 -22 #480 
Bell Adjustment: Vocational 47 13-31 34 
RAPH: Rigidity -52 -54 -08 56 
MTAI: Teacher Attitude 56 65 26 80 
Tp: Teacher Positive (MMPI) 77 #19 -O01 63 
Tn: Teacher Negative (MMPI) -75 O03 -26 £62 
Authoritarianism (Adorno) -49 -34 33 46 


* Decimals omitted throughout. For identification of variables see Table XIV. 
**Correlations between the rotated factors: I and Il, .47; land .53; and IQ, -. 24. 
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Figure 3. 
(Subscripts indicate factor analysis which located point. 
means anticodies of the vector point of contact with sohere. 
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shows the unrotated and rotated loadings for ob- 
lique simple structure. The first factor seems 
to be General Teaching Adjustment, the second 

Authoritarianism and the third related to Status. 


4. Description of the Variables of Sample Three 


The ten variables of sample three were four 
of those used in Sample Two and six new ones. 
The first five consisted of the home, health, so- 
cial, emotional and vocational scales of the Bell 
Adjustment Inventory (Adult Form). These 
scales were reversed so that a ‘‘low’’ score was 
considered high in desirability in interpretation; 
in other words, the scales were turned so that 
the desirable social result was ‘‘up’’. In conse- 
quence, these scales would be expected to corre- 
late positively with a scale of adjustment and neg- 
atively with a scale of maladjustment. The sixth 
measure was the so-called RAPH scale, ameas- 
ure of rigidity of attitudes regarding personal 
habits developed by Meresko (8). The last four 
scales were the Minnesota Teacher Attitude In- 
ventory, the Tp and Tn scales for the MMPI, 
and the Adorno-type authoritarianism scale used 
in Sample Two. The intercorrelations of these 
variables are shown in Table XIV. 


5. Factor Analysis Results 


Three factors were extracted. Table XV 
shows the unrotated and rotated loadings for ob- 
lique simple structure. The first factor again 
seems to be General Teaching Adjustment, and 
the second again Authoritarianism. The third 
seems to be most closely related to the Bell Emo- 
tional Scale. 


6. Collated Results 


When figures were drawn to represent the 
several factor analyses and were compared, it 
appeared that most of them displayed com mon 
factor space. It is recognized that much of the 
variance of the variables cannot be expressed in 
three aimensions, yet there seemed to be enough 
of a ‘‘common view’’ to make it worthwhilé to su- 
perimpose the diagrams. This‘has been done in 
Figure 3, which shows the vector intersections 
of selected scales with the positive hemisphere 
pooling the results of a number of different fac- 
tor analyses. The Tp point has been used to lo- 
cate the center pole, and the authoritarian pole 
is at the extreme left, so that the vertical axis 
is its great circle. Lines have been drawn be- 
tween variables which purport to measure the 
same thing. Subscripts indicate which factor 
analysis was involved. It will be noted that 
there is considerable uniformity in the position 
of the points, even as between different factor 
analyses. The angle between authoritarianism 


and intelligence, for example, seems to be about 
135 degrees. The 30 degree and 60 degree small 
circles have been drawn, and various areas of the 
circumference great circle have been named. 
From Figure 3, Table XVI has been construct- 
ed. This table gives in very rough form estima- 
tions of the correlations between selected clusters 
noted in Figure 3. It also gives the angular devi- 
ation measured from the pole between these clus- 
ters. Such an arrangement provides a kind of 
circular coordinate system. It is to be empha- 
sized that measurements are rough only and are 
therefore inexact. The arrangement of these vec- 
tors in the factor space is perhaps made more 
understandable by such a procedure. At least, 
their relationships to each other in the common 
factor space becomes more apparent. It is the 
contention of the writers that the existence of this 
common factor space as revealed by several fac- 
tor analyses of different tests on different popula- 
tions helps to further understanding with regard 
to the interrelationships between these variables. 
The writers believe further interpretation 
should await corroboration by others and further 
exploration. They are aware of the rough and of- 
ten informal methods utilized with some of the da- 
ta and of the incompleteness of many of its parts. 
It was Thurstone himself who said, in this regard: 


The exploratory nature of factor anal- 
ysis is not often understood. Factor an- 
alysis has its principal usefulness at the 
borderline of science.... These new 
methods have a humble role. They en- 
able us only to make the crudest first 
map of anew domain. But if we have sci- 
entific intuition and sufficient ingenuity, 
the rough factorial map of the new domain 
will enable us to proceed beyond..... 
(11:56) 


It is in this light that these explorations are of- 
fered. 


Summary 


This paper reports the intercorrelations and 
resulting factor analyses from giving extensive 
testing batteries to teaching candidates. The ma 
jor work sample was at UCLA, involving numbers 
ranging upwards to 1700 subjects and scales on 
Cooperative English, Stanford Arithmetic, Amer- 
ican Council Psychological, Allport Study of Val- 
ues, Minnesota Multiphasic, California Psycho- 
logical, Guilford-Zimmerman Temperament, 
and two scales on Teaching Prognosis devised by 
the writers. The minor work samples included 
two groups of about 100 subjects at Los Angeles 
State College, involving intelligence, status, au- 
thoritarianism, Minnesota Teacher Attitude In - 
ventory, Bell Adjustment Inventory, and the pre- 
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viously mentioned teaching scales. 

Results of the factor analyses seemed to show 
a common factor space, and helped to clarify the 
relation of other generally used variables to this 
measure of teaching potential. Further investi- 
gations appear in order. 
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APPLICATION OF ANALYSIS OF VARIANCE 
TO THE ESTIMATION OF THE RELIABIL- 
ITY OF OBSERVATIONS OF TEACHERS’ 

CLASSROOM BEHAVIOR 


DONALD M. MEDLEY and HAROLD E. MITZEL 
Division of Teacher Education 
Municipal Colleges of New York City 


MANY EDUCATORS and psychologists have 
come to believe that the efficient way to esti- 
mate the internal consistency of a measuring in- 
strument is to divide it into halves, score each 
half separately, correlate the pairs of scores so 
obtained by means of the Pearson product-mo- 
ment coefficient, and correct this value for the 
fact that it is based on halves instead of wholes. 
This procedure was developed independently by 
Spearman (13) and by Brown (2) for the solution 
of certain test construction problems. Itis also. 
widely believed that the efficient way to esti- 
mate the stability of an instrument is to adminis- 
ter equivalent forms to a sample of subjects and 
correlate the two sets of scores so obtained. It 
is the purpose of this paper to suggest the effi- 
ciency of the analysis of variance technique for 
estimating the reliability of educational meas- 
ures, and to illustrate use of the technique on ob- 
servations of teachers’ classroom behavior. 

The analysis of variance has been suggested 
as a method for estimating test reliability by 
Jackson (6,7), Hoyt (5) and Alexander (1). Its 
use for estimating the reliability of grades as- 
signed to compositions or essay exam inations 
was described by Pilliner (11). Lindquist (8: 
357-82) has recently presented a rather com- 
plete discussion of the general use of the analy- 
sis of variance technique for reliability est ima- 
tion in educational and psychological measur e- 
ment. The method is particularly well adapt- 
ed to observational data, as Lindquist -emarks, 
but concrete examples of its proper «se are not 
available in the literature. 

In connection with a longitudinal study of 
teacher education graduates of the New York 
City municipal colleges (14), the reliability of 
two observational techniques for assessing 
teachers’ classroom behaviors was studied. 
The method of estimating reliability used in 
this longitudinal study will be described briefly, 
and its application to the classroom observation 
data will be illustrated. Methods of computa- 


tion will not be given since they are readily acces- 
sible in other sources; emphasis will be on the 
logic of the procedure and the interpretation of re- 
sults. 


Method of Analysis and Definition of Terms 


Suppose that N teachers are visited m times 
each by ateam of n observers. Each teacher will 
be assigned a score on the dimension of interest 
by each observer on each visit, yielding a total 
of mn scores per teacher and a grand total of 
Nmn observations for analysis. 

Among the factors which may be expected to 
produce variation among the scores are two: dif- 
ferences among teachers and differences among 
visits. For convenience, Tj will be used to rep- 
resent the deviation (from the mean of all obser- 
vations) associated with Teacher i, and Vj will be 
used to represent the deviation associated with 
Visit j. It is understood that Tj will be the same 
for Teacher i on every visit and Vj will be the 
same for all teachers on the jth visit to each of 
them. 

If Pj; is the performance of Teacher i on visit 
j, it is probable that 


Pij # Ti + Vj 


In other words, there is likely to be an ‘“‘inter- 
action’’ between visits and teachers—some teach- 
ers may do better on the first visit than on any 
other; other teachers may do better on the last 
visit, etc. Therefore, let 


Ijj = Pjj - (Ti + Vj); (1) 


Ijj is the interaction component of a teacher’s 
score on a particular occasion, 

When a particular observer k visits a particu- 
lar teacher i on a particular occasion j, the 
score Xjjk (taken as a deviation from the mean 


of all values of Xjjk) that that observer assigns 
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to the teacher may not be identically equal to 
Pjj, the actual performance of that teacher on 
that visit. Define 


Cijk = - Pij 
= Xijk - Tj Vj - (2) 


The ‘‘error’’ ejjk will include all parts of the 
score not otherwise accounted for in equation 
(2). 

Ij will be referred to as the visit error for 
teacher i on visit j; ejjk as the residual or ob- 
server error for teacher i on visit j in observa- 
tion k. Error is thus partitioned into two parts 
—one containing errors due to lack of stability 
in teacher performance, and the other contain- 
ing all errors independent of such lack of stabil- 
ity. (The latter component is referred to as 
“‘observer’’ error because it will show up inthe 
discrepancy between two records of the same 
performance made by two different observers.) 

If (2) is rewritten as follows: 


Xijk = Ti + Vj + lij + Cijx, 


and if both sides are squared and the operation 
of taking mathematical expectations in the popu- 
lation (generated as N, m, andn all approach 
infinity) is performed, the result may be writ- 
ten: 


Ox? = Of? + Oy? + Oty? + 0%, (3) 


where 0? is the total variance for all observa- 
tions X, ot? is the variance of the Ti, oy? of the 
Vj, Stv* of the 1jj, and o? of the ejjk, in their 
respective populations. 

What is meant by the ‘‘reliability’’ of ascale 
depends on what true score is of interest, since 
the error in a score is the difference betweenit 
and the true score it estimates. As will be 
shown, the reliability of a scale as a measure 
of Pjj will generally be greater than its reliability 
as a measure of Tj. Inthepresentinstance, the 
true score of interest is Tj, the mean of all per- 
formances Pjj of teacher ionall occasions j on 
which a visit might be made tothe teacher. Ideally, 
the population of visits j should include all possible 
situations that arise during a teacher’s career. 
More realistically, it should include all situa- 
tions during a particular school year or term; 
this could be approximated by use of proper 
sampling procedures in selecting the times at 
which observations are made. Then Tj would 
represent the ‘‘typical’’ performance of Teach- 
er i. 

Similarly, the nature of the population of 
teachers is not clearly perceived unless the 
teachers observed are drawn at random from a 
specified population of teachers. It will be as- 
sumed, however, that both populations do exist, 


whether or not they can be specified. 
The reliability of a score based on a single ob- 
servation may then be defined as 


R = o¢?/ (o¢? + Oty? + 07) (4) 


The numerator on the right is the variance of 
Tj; that is, the ‘‘true score’’ variance. The de- 
nominator is the sum of the true score variances 
and the two error variances. Comparison with 
equation (3) reveals that this sum represents the 
total variance of the scores with the component 
due to visit differences, oy*, removed. This var- 
iance is removed because we will compare teach- 
ers who have all been visited equally often, and 
the scores will be means over all visits so that 
the visit effects are cancelled out. The denomin- 
ator in equation (4) is the total variance of these 
scores about their mean. The reliability coeffi- 
cient is thus seen to be the ratio of true score 
variance to total variance, or, inother words, . © 
proportion of the total variance attributable to dif- 
ferences among teachers. 

This reliability coefficient R is the parameter 
that is usually estimated by correlating scores as- 
signed to a set of teachers by two observers vis- 
iting the teachers at different times. This meth- 
od of estimating reliability is quite unsatisfactory, 
however, since only two scores per teacher can 
be used, with the result that the estimate has 
very low precision when the number of teachers 
is small. 

A second type of ‘‘reliability’’ coefficient that 
is sometimes used regards Pjj, the true perform- 
ance of teacher i on visit j, as the true score to 
be estimated. This coefficient will be referred 
to here as the coefficient of observer agreement, 
R', and may be defined as follows: 


R' = (04? + Opy?)/(o4? + + 07) (5) 


In this case, fluctuations in teacher perfor m- 
ance are regarded as part of true score variance, 
since they are capable of being observed by all ob- 
servers present on a particular occasion. This 
coefficient may also be estimated by correlating 
scores assigned to a group of teachers by two ob- 
servers visiting each of them at the same time; 
it is a measure of observers’ ability to agree in 
their records of the same performance. R' does 
not indicate how reliably the teachers are dis- 
criminated, and therefore should not be referred 
to as a reliability coefficient. 

The reliability of the méan of a number of 
scores assigned to the same teacher is easily de- 
rived. In terms of the observer team size n, and 
the number of visits, m, it is Rmn, where 


Rmn = (mno¢?)/(mnoy? + noty? + 0?) (6) 
If it is assumed that Tj, Vj; Ijj, and ejj,x are 


normally and independently distributed in repeat- 
ed random sampling with zero means and vari- 
ances 0¢?, dy”, Oty? and o?, respectively, then 
the values of these variance components and 
hence of the coefficients R and R' may be esti- 
mated from an analysis of variance table of the 
form shown in Table I. 

Table I is based on samples of N teachers, 
m visits, and teams of n observers each. The 
observed mean squares and their expectations 
in terms of variance components are shown at 
the right. Estimates of the variance components 
of interest may be obtained from the observed 
mean squares as follows (the symbol ‘‘(=)’’ may 
be read, ‘‘is estimated by’’): 


o? s? 
Ory? (=) (Sp? - /n (7) 
of (=) (S_" - Sty?) / mn 


By substit ..g these estimates and the ap- 
© Upssace values of m and n in equations (4) to 
(vu), estimates of the coefficients of reliability 
and of observer agreement secured in a given 
experiment may be obtained. 

It is also possible to test the hypotheses, 


Ho: o4? = 0 
H,: Oty? = 0 


and 


Hypothesis Hg states that the scale fails to 
discriminate among teachers; hypothesis H, 
states that there is, on the average, no greater 
variation between two records based on differ- 
ent visits than between two records based on the 
same visit. 

H, is tested by comparing 


F, = Sty’ / s? 


with Snedecor’s F distribution (10:222-225) with 
degrees of freedom n, = (N - 1)(n - 1). and nz,= 
Nm(n- 1). If H, is accepted, it is concluded 
that oty? = 0, and Table I is superseded by an 
analysis of the form shown in Table II. For oty? 
in equation (3) to (6) a zero is substituted, and 
the estimation equations for the variance com- 
ponents become: 


a? (=) Se? 
ot? (=) (St? - Se?) / mn (8) 


Since oty” = 0, Hg is tested by comparing 
Fo = st? / Se? 
with the tables of the\F distribution with degrees 
of freedom n, = (N - 1) and ng=N(mn- 1)-(m - 1). 


If H, is rejected, it is concluded that o¢y?>0, 
and Ho is tested by 
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F, = st? / Sty? 


with the F tables with degrees of freedom n, = 
(N - 1) and nz = (N - 1)(m - 1). 

If Ho is accepted, it is best to conclude that 
the reliability of the scale is zero; if Hg is reject- 
ed, it is proper to estimate R as indicated in equa- 
tions (4) and (7). 


Application of the Design to Tryouts of the 
Cornell Technique 


The first of the two techniques employed in this 
study was developed by Francis G. Cornel! and 
his associates at the University of Illinois. For 
the purposes of this investigation, Cornell’s tech- 
nique was modified slightly; readers interested 
in the original form should consult the monograph 
in which it was originally presented (3); the modi- 
fied form is described in a monograph by Medley 
and Mitzel (9). 

Six observers participated in the tryouts. They 
visited 33 teachers in teams of two observers 
each. Each of the six observers saw each of the 
33 teachers once, so that the total number of 
scores on each of the eight dimensions was 198, 
The six observers were grouped into one set of 
three teams and the first eleven teachers were 
visited by each team, no two teams visiting the 
same teacher on the same day. The six observ- 
ers were then rearranged into a different set 
of three teams, and eleven more teachers were 
visited. Finally, the team composition was 
changed a third time and the remaining eleven 
teachers were visited. 

The simplest way of analyzing these data is to 
regard each series of visits to eleven teachers as 
a distinct tryout. In this case,N=11, m=3, and 
n= 2. The design in Table I could be used to an- 
alyze the results of each tryout separately. 

If it is assumed that all 33 teachers may be re- 
garded as having been drawn at random from the 
same population of teachers, and that all of the 
nine teams used may be regarded as having been 
drawn at random from the same population of 
teams, then it is reasonable to expect that the cor- 
responding observed mean squares in different 
tryouts estimate parameters of the same popula- 
tion, and the respective sums of squares may be 
pooled to yield more precise estimates of the par- 
ameters. Since there are 198 scoresinall, yield- 
ing a total of 197 degrees of freedom, and since 
each tryout employs 66 scores, yielding 65 de- 
grees of freedom per tryout, or a total of 195, 
there are two degrees of freedom remaining. 
These two degrees of freedom may (under the as- 
sumption stated) be used to estimate the ‘‘teach- 
er’’ mean square from differences between 
groups of teachers, making a total of 32 degrees 
of freedom available for this purpose. The com- 
plete design is shown in Table II. 
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TABLE I 


PLAN FOR RELIABILITY ANALYSIS OF VARIANCE OF 
SCORES ON A BEHAVIORAL DIMENSION 


Mean Squares 
Source of 


Variation d.f. Observed Expected 


Teachers N-1 S42 o? + + 
Visits m-1 o? + NOty2 + Nnoys 
Visit Error (N - 1)(m - 1) 4 o? + ndty2 


Observer Error Nm (n - 1) o? 


Total Nmn - 1 


TABLE II 


PLAN FOR RELIABILITY ANALYSIS OF SCORES ON A BEHAVIORAL DIMENSION WHEN 
THERE IS NO INTERACTION BETWEEN TEACHERS AND VISITS 


Mean Squares 


Source of 
Variation .£. Observed Expected 


Teachers + 
Visits o* + Nnoy2 
Error N(mn - 1)-(m - 1) o? 


Total Nmn - 1 


: 26 


MEDLEY - MITZEL 


TABLE 


COMPLETE DESIGN FOR ANALYZING THE SCORES OBTAINED ON A CORNELL 
SCALE IN THE SERIES OF THREE TRYOUTS 


Mean Squares 


Source of 
Variation ° Observed Expected 


Teachers st? o? + 2oty2 + 6oz2 
Visits o? + 2oty2 + 220y2 
Visit Error o? + 2oty2 


Total 


TABLE IV 
RELIABILITY ANALYSIS OF DIFFERENTIATION SCORES 


Source of 
Variation ° Sum of Squares Mean Square 


Teachers 1828. 3132 57. 1348 
Visits 149, 9696 24. 9949 
Visit Error 1044. 6970 17. 4116 


Observer Error 664. 5000 6. 7121 


Total 3687. 4798 
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As an example, the complete analysis of var- 
iance for the Differentiation scale, based on the 
three tryouts, is shown in Table IV. 

The test of H, (o¢y? = 0) yielded an F ratio 
of 2.59, which is beyond the .01 point of the ta- 
bled distribution; it was, therefore, concluded 
that cty* is greater than zero, and that varia- 
tions in teacher performance from day to day 
are a iactor contributing to the error in Differ- 
entiation scores, 

The test of Hg (o¢? = 0) yielded an F ratio of 
3.28, which is also beyond the .01 point. It was, 
therefore, concluded that o¢? is greater than 
zero, and that the Differentiation scale discrim- 
inates teachers with a reliability greater than 
zero. 

When estimated according to equation (7), the 
components of the variance of a score were tak- 
en to be as follows: 


The estimated reliability of a Differentiation 
score based on a single record for one 25-min- 
ute visit is: 


r = (6, 62)/(6, 62 + 5.35 + 6.71) = .35 


and the estimated coefficient of observer agree- | 


ment is: 
r' = (11. 97)/(18. 68) = . 64 


The r of .35 indicates that 35 percent of the var- 
iance of the scale is due to differences among 
teachers; 65 percent must, then, be due to er- 
rors of measurement. From the estimatedcom- 
ponents we calculate that 29 percent of the vari- 
ance is due to visit-to-visit variations in teach- 
er behavior, and 36 percent to discrepancies be- 
tween different observers’ records of the same 
behavior. 

A similar analysis was carried out for each 
of eight scales employed in Cornell’s technique. 
The results are summarized in Table V which 
gives the estimates of variance components and 
the coefficients of reliability and observer agree- 
ment. 

Three of the scales did not detect differences 
among the teachers in this study—Pupil Climate, 
Pupil Initiative, and Content. In the instances 
of Pupil Initiative and Content there is evidence 
that observers were able to agree on scores 
based on a single performance tothe extent 
necessary to achieve correlations of .43 and .23 


respectively; but there was so much var iation 
from one performance to another that no stable 
difference among teachers was detected. The 
Pupil Climate dimension was not observable by 
the six observers employed—no two of them could 
agree about the score to be assigned a given per- 
formance observed by both. 

There are two scales—Variety and Teacher 
Climate—for which no error due to instability of 
teacher performance was detected. This sug- 
gests that the performance of teachers in these 
respects is relatively stable. 

The reliabilities of the best five scales were 
all of the same order of magnitude, ranging from 
.32 to .42. None of these values seems to be large 
enough to be used for estimating the typical score 
of an individual teacher. However, itis apparent 
from equation (5), which may be written: 


r = 04° /[(o¢? + ofy?)/m + 0? / mn] 


that by increasing either n (the number of observ- 
ers on a team—that is, the number in the class- 
room at one time) or m (the number of visits made 
to the classroom) the reliability can be increased. 
If m is allowed to increase without limit while n 
remains the same, Rmn approaches a value smal- 
ler than one, because increasing n reduces observ- 
er errors only, while increasing m reduces both 
types of error. The question of an optimal way 
of distributing a fixed number of observer-hours 
—that is, how large to make n when the number 
of observer hours mn is fixed—may be answered 
on the basis of the graph shown in Figure 1. 

‘ Figure 1 shows the reliability of five Cornell 
scales as a function of n, the number of observ- 
ers visiting a teacher at the same time, whenmn 
(the number of observations) is equal to twelve. * 
Thus, if one observer visits one teacher atatime, 
twelve visits must be made, each observer visit- 
ing the teacher on a different day; if two observ- 
ers visit the teacher at one time, six visits must 
be made—and so on, up to the case n= 12, in 
which all twelve observers must visit the teacher 
at one time. 

The curves in Figure 1 show unmistakably 
that the reliability of each of the scales falls off 
as team size increases. For a given cost per ob- 
server-hour, there seems little doubt that the 
greatest precision per dollar spent is obtained by 
sending observers into classrooms one by one. 

It might be remarked in passing that if the 
same observer visits a teacher twelve times, the 
reliability is substantially higher than when 
twelve different observers visit the teacher once 
each, because in the former case o? = 0, and 
hence the quotient in equation (5) is greater. How- 


*When these curves were plotted, the estimate of each component was computed according to equation 
(7), whether or not the component could be shown to be different from zero. 


o? (=) 6.71 
Oty? (=) 5.35 
(=) 6.62 
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TABLE V 


SUMMARY OF RESULTS OF RELIABILITY ANALYSES OF EIGHT CORNELL SCALES 


Scale 


Components of Variance 


True 
Score 


Visit 
Error 


Observer 
Error 


Total 


Reliability 
Coefficient 


Coefficient 
of Observer 
Agreement 


Activity 

Variety 

Pupil Climate 
Teacher Climate 
Social Organization 
Differentiation 
Pupil Initiative 


Content 


20. 49 
1.51 
0. 00 
2.75 
3. 82 
6. 62 
0. 00 
0. 00 


10. 53 


0. 00 
0.00 


0. 00 
2. 96 
5. 35 
7. 26 
10. 34 


18. 63 
2.13 
1.85 
5. 78 
3. 52 
6. 71 
6. 43 

34. 71 


49. 65 
3. 64 
1.85 
8.53 

10. 30 

18. 68 

13. 69 

45.05 


42 
- 00 


TABLE VI 


ANALYSIS OF VARIANCE OF SCORES FOR WITHALL’S CATEGORY 1: 
LE ARNER-SUPPORTIVE STATEMENTS 


Source of 
Variation 


Mean Squares 


Observed 


Expected 


Teachers 

Visits 

Visit Error 

Observer Error 
Total 


81.015 
98. 623 
24. 468 

1. 703 


o* + 2otye + 16042 
o* + 2oty2 + Boye 
+ 2otye 


o? 


29 
.00 
.37 
.35 . 64 
.00 
.00 
| 
3 
7 
21 
32 = 
63 
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DIFFERENTIATION 
VARIETY 

ACTIVITY 

TCLUIMATE 


TEAM SIZE 
FIGURE 1 
The Reliability of Certain Cornell Scales 


as aFunction of Team Size when The 
Total Number of Visits is Iwelve 
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For these data, H, (0¢y* = 0) was rejected and 
Hg (o¢? = 0) remained in doubt, since the F ratio 
of 3.31 falls between the .01 and . 05 points of the 
F distribution. The components of variance at- 
tributable to observer error, visit error, and dif- 
ferences among teachers were estimated to be 
1.703, 11.383, and 4.552 respectively. 

Similar analyses were made of the scores for 
categories 3, 4, 5, 6, and the Climate Index de- 
fined above. The results are presented in Table 
VII in the form of estimates of variance com pon- 
ents and reliability coefficients. As before, com- 
ponents not found to differ significantly from zero 
are reported equal to zero; and when ot? does not 
differ from zero, the corresponding reliability 
coefficient is reported as zero. No analysis of 
categories 2 and 7 was made since remarks clas- 
sified in these categories were so rare that an an- 
alysis of them did not promise to be fruitful. 

Two categories failed to show reliability great- 
er than zero—Neutral and Reproving; and one— 
Learner-supportive—remaineu in doubt. The low 
coefficient of observer agreement reported for 
Neutral statements indicates that the definition of 
this category may be unclear; that for Rep roving 
statements is high enough to suggest that the fail- 
ure of the scores to discriminate teachers is prin- 
cipally due to instability of this aspect of teacher 
behavior. This instability is also reflected inthe 
relatively larger component of variance due to vis- 
it errors. The same conclusion is indicated re- 
garding Learner-supportive statements. The re- 
maining three categories have reliability c oe ffi- 
cients around . 50. 

Curves like these in Figures 1 and 2 can be 
plotted for these data. When plotted, the curves 
indicated that all three of the ‘‘reliable’’ scales 
—Problem-structuring, Directive, and Climate 
Index—should reach a reliability of . 90 when they 
are based on ten visits. 


ever, such reliability is probably gained at the 
expense of validity, because observer biases 
are not cancelled out, but remain to distort 
teacher differences. 

Figure 2 shows the reliability to be expected 
for any number of visits (by a single obser ve r) 
up to twenty. The rate of increase varies for 
different scales, but all of them level off. Ifa 
reliability of .90 is required for a particular 
purpose, sixteen visits would have to be made 
if Differentiation, Social Organization, Variety, 
and Activity are to be scored. Twenty visits 
will bring Teacher Climate scores up to this lev- 
el also. 


lication of the Design to the Tryouts 
of Withall’s Technique 


The second of the two techniques e mp1 oyed 
was based on Withall’s categories of verbal be- 
havior (15). The procedure used has been de- 
scribed elsewhere (8). The method consists in 
classifying the statements made by a teacher in- 
to seven mutually exclusive categories: 1. Learn- 
er-supportive; 2. Acceptant or clarifying; 
3. Problem-structuring; 4. Neutral; 5. Direc- 
tive; 6. Reproving, disapproving or disparaging; 
and 7. Teacher-supportive. The first three cat- 
egories were combined and the category obtained 
was called ‘‘Learner-centered’’ statements. 
The ratio of the sum of these three statement 
categories to the total number of statements 
made by a teacher is called the ‘‘Climate Index.’’ 

The tryouts with the Withall technique em- 
ployed two observers working as a team with 
four teachers in a single elementary school. 
Each teacher was visited by the team of two ob- 
servers on four occasions about a week apart. 
The observers remained in the classroom dur- 
ing each visit until approximately 100 statements 
had been classified. After comparing notes and 
clarifying the definitions of the categories, the 
same two observers visited each of the four 
teachers four more times at one-week intervals. 
Thus, there were available, finally, a total of 64 
counts in each category—corresponding to eight 


Discussion 


The results of the analyses presented above 
illustrate some of the practical advantages of the 
analyses of variance over correlation analysis in 
visits by two observers to four teachers; in the estimating the reliability of observational data 
notation of this report, m = 8, n= 2, N= 4. when more than two scores per person are avail- 

The count for each category for one period able. First, the analysis of variance yields a 
was divided by the total number of remarks tal- single estimate of the reliability coefficient which 
lied in that period. The proportion so obtained uses all of the information contained in the data; 
was then transformed to an angle measured in second, the analysis of variance makes it pos si- 
degrees by the use of the arc sine tr ansforma- ble to partition the error into components attrib- 
tion (12:449-50). Such scores have the advan- utable to different sources; and, finally, itis pos- 
tage of having standard errors of measurement sible (if the necessary assumptions are fulfilled) 


which are independent of the magnitude of the 
score. 

The design used for analyzing each categery 
of response was as shown in Table VI, which 
also gives the results for Category 1: Learner- 
supportive statements. 


to test the significance not only of the coefficient 
obtained, but also of each component of error. 

In the study of the Cornell technique, for ex- 
ample, no fewer than 36 correlation coefficients 
estimating the reliability of the technique could 
be obtained, all equally accurate. Each such es- 
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Reliability of Certain Cornell Scales 
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timate would be a product-moment coefficient 
based on 11 cases, so that the accuracy of any 
one would be low. Moreover, each estimate 
would have a considerable bias (4:205). A mean 
of the 36 coefficients could be used, but the bias 
would remain (or perhaps increase, since the es- 
timates would not be independent); and nothing 
is known about the sampling error in such a 
mean. The estimate obtained by analysis of var- 
iance, however, is unbiased (4:225), unique, and 
of known precision. ~ 

In each of the examples given we have parti- 
tioned our error variance into two components, 
and shown how such a partition of errors can be 
used in planning further uses of the observation- 
al technique by indicating where some of the er- 
rors originate, and can yield estimates of differ- 
ent correlations. A different design could separ- 
ate errors due to differences in behaviors ob- 
served on different days from differences in be- 
haviors observed on the same day; errors due to 
differences between observers from differences 
in what a given observer sees in different five- 
minute periods during the same visit, etc. A 
‘‘reliability’’ coefficient corresponding to each 
type of error could be estimated, the relative 
importance of each source of error could be as- 
sessed, and plans for future observations could 
then be made more intelligently. 

When an instrument of low reliability is tried 
out on a small scale, as is usually the case when 
the instrument requires a rather large expendi- 
ture of a trained observer’s time before even 
one measurement is obtained, it is essential that 
it be possible to test the hypothesis that the true 
reliability of the scale is zero, as well as to es- 
timate its magnitude, since sometimes a rel ia- 
bility large enough to appear useful may be non- 
significant. Such tests are easily made as part 
of the analysis of variance. It is also possible 
to test whether a certain suspected source of er- 
ror (as, for example, observers’ fallibility) is 
in fact making a significant contribution. 

These advantages of analysis of variance 
clearly indicate the unsuitability of the corre- 
lational method when the data available include 
more than two independent measures of each in- 
dividual. The only situation in which the latter 
method might be useful is that in which a set of 
N pairs of scores on equivalent forms of atest 
are available. Indeed, the reliability coefficient 
is often defined as the correlation between 
scores on equivalent tests in the population of in- 
dividuals. It is natural to assume that the corre- 
lation in the sample is the appropriate estimate 
of the correlation in the population. But when 
the correlation in question is a reliability coeffi- 
cient this is not true. 

If we are correlating a test x and another 
measure y, the population correlation may be 
written as follows: 


Ruy = Oxy / Ox ys 


where ox and oy are the standard deviations of the 
two measures and Oxy is their covariance. The 

appropriate estimate of Rxy from a sample of N 
pairs of scores n and y is the product-moment or 
interclass correlation, which may be written: 


Ixy = Sxy / Sx Sy, (9) 


where » & and Sy estimate ¢. Ox, and oy 
But if we are correlating a at and an oquiv- 
alent test x', the population correlation is 


Rxx' = / 0x’, 


where ox? is the variance of either test, and oxx' 
the covariance of the two. The product-moment 
correlation coefficient used to estimate this would 
be 


x' = Sxx' / Sx Sx" ‘ 


where Sx? and Sx are estimates of ox? from 
each of the two tests, and Sy, is the estimated 
covariance. It is clear that we are using the geo- 
metric mean of two sample estimates to estimate ~ 
the population variance, x?. 

If we analyze the total variance of the N pairs 
of scores x and x" into two components, one from 
comparisons between individuals, with N-1 de- 
grees of freedom, and onefrom comparisons with- 
in individuals or ‘‘error’’, with N degrees of free- 
dom, we can obtain the intraclass correlation r 
as follows: 


(mean square between) - 
(mean square between) + 


(mean square for error) (10) 
(mean square for error) 


This is an estimate of Rxxt which may be 
shown to be related to the interclass correlation 
rxx' as follows: 


r = (2Sy Sy rxx' - K) / (28x? +2Sy17+K), (11) 


where rxx', Sx, and Sx’ are as defined above, 


Xx and x’ are the test means, and 


2NK = N(x - + 2S, Syr - Sy? - Syr? 


It can be shown that K is never negative, and 
that when K is greater than or equal to zero, ris 
smaller than rxx'. We may, therefore, say that 
the estimate rxx' is always greater than the esti- 
mate r. 

Fisher (4:205, 211 ff. ) points out that ryx' sys- 
tematically < overestimates Rxx', but that r does 
not, and that the latter estimate is more precise 


MEDLFY - MITZEL 


than the former. The bias in rxy is small ex- 
cept when r is small, and the difference in pre- 
cision is slight. 

The procedure usually followed in actual 
practice is to analyze the variance of the 2N 
scores into three components rather than two; 
one for differences between individuals, with 
N - 1 degrees of freedom, one for differences 
between test means with one degree of freedom, 
and one for residual or error, with N-1 de- 
grees of freedom. The estimate r of reliability, 
as defined in formula (10) above, may then be 
written: 


/ (Sx? + Sx! 2) (12) 


Comparing this with formula (9) we see that r 
differs from rxx' in that it uses the arithmetic 
mean of the two sample estimates of ox? instead 
of the geometric mean. 

The internal consistency reliability of a test 
may be estimated from an analysis of variance 
of N pairs of half-test scores in either of the two 
designs described above from the formula: 


_ (mean square between)-(mean square for error 
is (Mean square between) 


Summary 


A procedure for estimating the reliability of 
scores based on observations of behaviors was 
described, and its use illustrated in two some- 
what different situations. A discussion of the 
relative merits of analysis of variance aid cor- 
relational analysis as techniques for estimating 
reliability coefficients led to the conclusion that 
the former has three distinct advantages over 
the latter. It yields a single best estimate of re- 
liability; it supplies independent measures of the 
amount of error from different sources, and it 
provides for simple, exact tests of significance. 
When only two sets of measurements are avail- 
able, an estimate of reliability may be obtained 
by correlational analysis, but it is biased and 
has a larger sampling error than that obtained 
by analysis of variance. When more than two 
sets of measurements are available, no satis - 
factory estimate can be obtained by correlation- 
al analysis. We, therefore, suggest that the 
use of the correlational technique be limited to 
validity estimation, and that the analysis of var- 
iance be adopted as the standard procedure for 
estimating reliability. 
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SECTION I 


Statement of the Problem 


IN A LARGE city school system employing 
over nine-thousand teachers it is readily conceiv- 
able that there will be many differences among 
teachers in terms of their methods, philosophy, 
goals, behavior, and their attitudes toward chil- 
dren and adults. Even ina single school, wide 
differences are not hardtofind, yet most schools 
seem to operate in a fairly smooth, efficient, 
and productive manner. Looking still more 
closely, those of us who are on the ‘‘inside’’ of 
the educational scene have seen, in all probabil- 
ity, evidence of differences of varying degrees 
within even the smallest formal school-unit, the 
department. It is this unit, consideredas afunc- 
tioning group, about which our study was con- 
cerned. Specifically, the problem was to ana- 
ly ze and describe what happened when a teacher- 
administrator group initiated, organized, con- 
ducted, and evaluated acurriculum improvement 
project in one department of an urban junior high 
school over a forty-week period. 


Major Hypotheses, Related Questions 


The plan and procedures of this study were 
aimed at finding answers to the following ques- 
tions: 


1. What changes, if any, occur in the teach- 
ers’ perceptions of their own responsibilities in 
teaching when they become involved in coopera- 
tive curriculum improvement? 

2. What will be the outcomes of a small devel- 
opmental study which seeks to stimulate by large- 
ly non-directive means the improvement of in- 
struction in a single department within one ur- 
ban junior high school? 

3. What conditions or factors appeared to be 


influential in- tending to make the teachers produc- 
tive and creative in the group? 


Five Major Hypotheses 


The hypotheses tested in this study are stated 
below. 


1. There will be tangible modifications in 
classroom practices of the teachers in- 
volved in the study. 

2. A variety of instructional methods will be 
tried and tested. 

3. There will be an increase ‘in the confidence 
of teachers in defining problems. 

4. Teachers will feel increasingly secure in 
exchanging suggestions with each other. 

5. The Head of the English Department and the 
Principal will allow and encourage teachers 
to try, test, and develop newer methods, 
techniques, and courses. 


Related to the testing of these hypotheses, 
sonie data on the questions below were looked for 
and examined. 


1. What are the strong points of the action re- 
search method and the work-group-confer- 
ence technique? 

2. Under what conditions can these methods 
best be used in creating curriculum change? 

3. What are the limitations of the methods and 
their application? 


Two Assumptions Underlying the Study . 


Two basic assumptions were made by the writ- 
er at the outset of the study. 


AssumptionOne: The curricular experiences 
of pupils are determined in large measure 
by the values, goals, skills, and attitudes 


*An abstract of an unpublished Ed.D. dissertation, Wayne State University, 1957. 


held by teachers. 

Assumption Two: In order to change the cur- 
riculum it follows that there must be un - 
dertaken an attempt to change the values, 
goals, skills, and attitudes of the people 
involved in respect to education (2), but 
more specifically in respect to inte rper- 
sonal relations among members of a work- 
ing group. 


The ideas expressed in these assumptions were 
accepted and shared, at the beginning of the 
study, by the Supervisor of Language Education, 
the principal of the participating school, and the 
writer. In addition, the Director of Language 
Education for the city school system also en- 
dorsed these assumptions when he actively 
launched the project. 


Background and Significance of Curriculum 
Development 


One of the greatest chronic problems in edu- 
cation in the United States during the past thirty- 
five years has been the apparent lack of utiliza- 
tion of research findings by teachers in the na- 
tion’s classrooms. Tremendous quantities of 
research resultsfill library shelves and, al- 
though a great part of these research findings 
could be of inestimable value, they have been 
barely tapped. There are many reasons for this 
rejection or ignoring of research on the part of 
teachers, but, reasons or not, unless this pat- 
tern is changed the youth of our country will con- 
tinue to pay the price in the form of less full, 
beneficial, crucially-needed education. 

The background of one-hundred years of cur- 
riculum development is briefly and concisely 
presented in an NEA bulletin, 100 Years of Cur- 
riculum Improvement, 1857-1957 (1). Prepared 
by the Association for Supervision and Curricu- 
lum Development, the bulletintraces major 
changes in the concept of learning, teaching and 
curriculum improvement. Briefly, some of 
these are: 


1. The change from the faculty psychology of 
learning to an organismic, dynamic psy- 
chology... with emphasis on meaning, goal- 
seeking and integration in the learning pro- 
cess. 

2. Change from reliance on traditionand sub- 
jective judgment...to concern for scientif- 
ic research and the application of scientif- 
ic methods and findings. 

3. Changes in methods and materials (used 
in teaching) have grown out of the idea that 
how we learn is as important as what we 
Tearn. 

4. Changes in our total approach to children 
in learning situations have been influenced 
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by the finuings in the field of Child Growth 
and Development. 
5. Changes in patterns of participation in cur- 

riculum building: 

a) Fixed body of subject matter set by ‘‘ex- 
perts’’ to 

b) Shared participation—teachers, pupils, 
lay people led by 

c) Administrators, supervisors, and re- 
source persons. 


Action Research 


Action research, based on the ‘‘field theory’’ 
psychology of Kurt Lewin (5), is the newe'st re- 
search approach to educational problems because 
it has within it the potential to apply experimental 
social psychology to ‘‘natural’’ social groups. 
Good action research employs mathematical 
means of measurement and testing, statistical an- 
alysis and other tools of fundamental research. 


Teacher Participation 


During the past one hundred years the changes 
in ‘‘Who shall build the curriculum?’’ have been 
most marked. The current view that teachers 
should be given the opportunity to participate in 
curriculum improvement is perhaps the biggest 
step forward in the direction of actually bringing 
results of research into the classroom. Admin- 
istrators, teachers and supervisors have an op- 
portunity now, as never before, to work together 
cooperatively to improve all phases of education. 


Factors in Cooperative Effort 


To create actual improvement inthe class- 
room it is imperative that teachers understand, 
appreciate, develop and apply research. In order 
for them to do this they must be given opportun- 
ity to learn, change and improve. Adm inistra- 
tors and supervisors must afford this kind of op- 
portunity. They must structure a framework 
which is conducive to good personal relationships, 
gives a chance for free expression and deve lop- 
mental acquisition of research skills. Finally, 
the administrators and supervisors must be will- 
ing to support cooperatively created changes in 
the total curriculum. 


The Work-Group-Conference Method 


Meier, Cleary and Davis (6) drew on a num- 
ber of fields to create a technique of cooperative 
action which they labeled the ‘‘work-group-c on- 
ference method’’. This method is one of the new- 
er tools with which can be realized, functionally, 
a good human relations, action research ap- 
proach. It seems to have within it the potentials 
to release the creative and productive talents of 
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people working and acting in harmony. It seems 
to be a technique by which supervisors, consult- 
ants, and staff generally as well as._principals 
and other administrators can achieve successful 
improvement of their own and others’ behavior 
and, consequently, clearer thinking and sharper 
action to reduce problems inherent in the educa- 
tional scene. 


A Means of Problem Attack, Action, and 
Developing Research Abilities 


The work-group-conference method lends it- 
self ideally to problem solving because it has a 
social-psychological basis which encompas ses 
the total aspects of the individual, the group, 
and the environment in which these operate at a 
given time. A supervisor, when facing instruc- 
tional problems, can employ the technique of 
work-group-conference method in an action re- 
search frame and lead in helping teachers to 
solve the problems in a cooperative and scien- 
tific manner. 

Stephen M. Corey lists the following as ‘‘sig- 
nificant elements of a design for action re- 
search”’ (4): 


1. The identification of a problem area about 
which an individual or a group is sufficient- 
ly concerned to want to take some action. 

2. The selection of a specific problem andthe 
formulation of a hypothesis or prediction 
that implies a goal and a procedure for 
reaching it. This specific goal must be 
viewed in relation to the total situation. 

3. The careful recording of actions taken and 
the accumulation of evidence to determine 
the degree to which the goal has been 
achieved. 

4. The inference from this evidence of gener- 
alizations regarding the relation bet ween 
the actions and the desired goal. 

5. The continuous retesting of these generali- 
zations in action situations. 


Bases of Action Research (Summary) 


1. It is based on the social dynamics theories 
of Kurt Lewin. 

The psychological basis of the social dyn- 
amics theory is grounded on what is gener- 
ally termed the ‘‘field’’ theory-type of psy- 
chological action. 

3. Action research is usually carried out ina 
field setting in contrast to a laboratory set- 
ting. 

It is an extension of basic social research 
and includes in its methods the utilization 
of mathematical and conceptual problems 
of theoretical analysis. 
This research lends itself to immediate ap- 
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plication in on-going developmental situa- 
tions. 

6. Although it is not an inherent characteristic 
of action research that it always be a coop- 
erative enterprise, the application of find- 
ings is usually more effective if the investi- 
gator works in close collaboration with the 
persons of the agency or institution being 
studied. 


Philosophy of Cooperative, Developmental 


Improvement 


One of the greatest deterrents to research on 
the part of teachers (and also on the part of super- 
visors who want to involve teachers in research) 
is the fear, apparently, that the research will not 
conform to ‘‘high standards’’. Also, on the part 
of teachers, research in the traditional sense 
seems too far removed to be of fairly immediate 
help with problems of highly immediate import- 
ance. 

The work-group-conference method encour - 
ages developmental growthin teachers’ research 
abilities. Within a typical teacher-administrator 
or other adult group several levels of sophistica- 
tion in research ability will usually befound. As- 
suming that the group is well led and that con- 
ditions necessary for its successful operation are 
present, the members are likely to become se- 
cure and reasonably confident in attempting to set 
up a design and try objective problem solving. 


The fact that attempts at problem solving 
fall at various points on a continuum rang- 
ing from careless, untested inquiry to 
careful and reliable research is rarely em- 
phasized. This is regrettable, because, 
although teachers and other people value 
research in the abstract, they feel that it 
has little relation to the methods they must 
employ to solve their own problems. There 
is little motivation for practical problems 
to move in the direction of better and bet- 
ter research methods. They are learned 
with practice. To refrain from trying be- 
cause one lacks skill or has perfectionist 
aspirations precludes improvement, and 
improvement is what counts. (4) 


It is against this background of supervision and 
curriculum development theory and current con- 
cepts of research in this field that the problem of 
this study took form. 


SEC TION 
Structure and Development of the Study 


THE GENERAL plan of the study included: 
1) inviting teachers to participate voluntarily in 
a two-semester project; 2) enlisting the support 
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of the principal, department head, and special- 
ist supervisor; 3) providing for biweekly meet- 
ings over two full semesters; 4) providing for 
guest speakers, special materials, films, visits 
to other schools, etc.; 5) the author of this re- 
port serving as coordinator and organizer of 
group activities, encouraging members of the 
group to undertake experimentation intheir own 
classes and to report back to the group; 6) meas- 
uring teachers’ perception of roles of group’s 
status people. 

The report of the study described four phases 
of our group’s development, phases similar to 
those that are traced by Thelan and Dickerman 
in ‘‘Stereotypes and the Growth of Groups’’ (7). 
Because we followed the pattern of tracing major 
phases of the group’s growth, it is important to 
list briefly questions related to each phase (3). 


1. What happened to the project of our group? 

2. What happened to our group and the indi- 
viduals in it? 

3. What blocked the work of our group? 

4. What facilitated the work of our group? 


A final item under each phase was: 


5. Summation of evidence and interpretations 
of each phase. 


Methodology: Procedure and Sources of Data 


This project was an action research, cooper- 
ative type of study. All the participants worked 
on one major problem: the improvement of in- 
struction in English at this one junior high school 
The major features of the methodology incl ude 
the following: 


1. Each teacher had the freedom to work on 
a specific problem in the English (Language 
Arts) area. 

2. Teachers were encouraged to work in a 
manner of their own choice: a) cooperatively on 
one problem; b) individually on separate prob- 
lems within the scope of the English curriculum; 
or c) in freely formed subgroups on one or sev- 
eral problems. 

3. The teachers’ populations for study were 
their own pupils from one or more of their own 
classes and/or the available data on all pupils 
(cumulative records, reading test scores, fam- 
ily background information, etc.). 

4. The writer’s population for the study was 
not the pupils but all the twelve members of the 
study group. 


Part of the methodology includes definition 


* All footnotes will be found at end of article. 


of the roles of various people in the study: 1) the 
principal, 2) the department head, 3) the super- 
visor, 4) the coordinator, and 5) the roles of 
the nine participating teachers. 

The work-group-conference method was in- 
trinsic to the broad action research methodology 
of the whole project. The total group met about 
every two weeks (total of eighteen meetings in two 
semesters) after school for planning, discussion, 
reporting, and evaluation. The average length 
of each meeting was two hours and fifteen min- 
utes. The group focused its attention on ‘‘c on- 
tent’’, viz., various aspects of English: reading, 
grammar, composition, testing, spelling, hand- 
writing and other things. The writer was con- 
cerned with interaction, the dynamics of the group 
situation and any interaction between meetings as 
well as with the English curriculum, the teaching 
of reading, spelling, writing, listening, grammar, 
etc. 


Types and Sources of Data Used! * 


The following is a descriptive listing of the 
types and sources of data which were obtained 
over the entire forty-week period, starting in Sep- 
tember 1955 and ending in June 1956. Some addi- 
tional data were obtained in August and September 
of 1956, and this will be the last item on this list. 
Because the data were gathered in an on-going, 
evolving ‘‘situational frame’’, no attempt is made 
here to place the items chronologically in terms 
of ‘‘when’’ they were collected. 


1. Descriptions of Individual Research Pro- 


jects. Each teacher submitted a “‘Progress Re- 


port’’ on his research project during the thir ty- 
second week of the Project. In the fortieth week, 
each teacher contributed a Final Written Report — 
on his project in which he described, analyzed, 
and interpreted data gathered in his experiment. 
a) These reports were consolidated into a 
Group Final Report and submitted to the 
Director of the Language Education De- 
partment. 
b) Each member of the Study Group re- 
ceived a hectographed copy of the Group 
Final Report. 

2. Oral Reports. Some members ofthe group 
gave oral reports on their projects during the 
course of the Study. A discussion period fol- 
lowed each of the reporting sessions. 

3. Evaluation of the Project. Each member of 
the committee (group) was asked to respond (in 
writing) to the following: ‘‘What data, ideas, opin- 
ions or impressions have you gained from (a) the 
particular project you selected, and (b) what ef- 
fect have the Study and the conferences had upon 


your approach to your teaching problems?’’ 

a) This was done partially in the Progress 
Reports mentioned in item 1 inthis list. 

b) More evidence, although again only par- 
tial, was gotten in a more controlled 
manner through administering an “Opin- 
ionaire”; this device was administered 
twice, the second time at the suggestion 
of the group. 

4. Statements of Purpose. During the fourth 
week of the project each member was asked to 
state, in writing, what he perceived his own 
purpose to be in participating in the Study. These 
‘*Statements of Purpose’’ were compared to 
‘*Self-Evaluation’’ statements completed at the 
end of the study. 

5. Records of Group Meetings. A factual 
record (minutes) of each meeting was kept. Each 
set of minutes was analyzed and interpreted by 
this writer. 

6. Records of Conversations and Consul ta- 
tions. Insofar as it was possible tobe accurate 
and objective, several relevant talks bet ween 

this writer and individual members of the group 
were described. 

a) We tried to develop here the ‘‘Key Peo- 
ple’’ concept and how it relates to friend- 
ship factors and informal communica- 
tion. 

b) Attempts were made here, also, to show 
(1) comparison of some members’ pri- 
vately expressed views on the project 
with those expressed at Group Meetings; 
(2) values of liaison between the coordi- 
nator and key members of the group 
outside of regular group meetings. 

7. Measures of Rise and Fall of Members’ In- 
terest and Attitudes Toward the Project. “End- 
of Meeting Evaluation Slips” were given to the 
group members periodically. Each indi vidual 
filled out such a slip. 

a) A ‘‘Consolidation Sheet’? showing in 
minute detail all the responses made on 
the individual End-of-Meeting Slips was 
prepared by the coordinator and given 
to each group member. 

b) Both the individually completed slips 
and the Consolidation Sheet show rise 
and fall of interest as well as general 
attitude. 

c) The “‘Slips’’ gave us an index on each 
individual while the ‘‘Consolidation 
Sheet’’ showed a group (total) reaction. 

8. Data on Power Structure. Evidence was 
gathered indicating what the group members per- 
ceived to be the power structure of the total 
group. 

a) An ‘‘Opinionaire’’, a type of projective 
instrument, was prepared by this writer 
and administered to all members of the 


group. 
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b) The Opinionaire sought to discover spe- 
cifically ‘‘what the teacher-members per- 
ceived the roles of the principal, depart- 
ment head, supervisor, and the coordi- 
nator to be in this study. ’’ 

c) The Opinionaire was given a second time 
to only the teacher-members at their own 
suggestion; the results of the first and sec- 
ond res. onses will be compared. 

9. Evidence of Attitudinal Changes in Each of 
the Status People in the Group. This evidence is 


gathered from all of the sources mentioned above 
but treated separately in order that discrete state- 
ments may be made about each of the three ‘‘status’’ 
people—the principal, the department head, and 
the supervisor. 

10. Self-Evaluation Statements. Some mem- 
bers of the group were invited to submit adescrip- 
tive statement in which they attempted to answer: 
‘What did involvement in this project do for me 
personally?’’ The responses to this question will 
also be compared to the ‘‘Statement of Purpose’’ 
prepared at the beginning of the study. The Self- 
Evaluation statement was asked for aguring the 
summer following the termination of the study. 


Anticipated Outcomes of the Study 


1. It was felt that the study would give us var- 
ious kinds of evidence regarding the practicality 
and efficiency of using the work-group-conference 
method to accomplish curriculum change ina field 
situation over a relatively short period of time 
but under intensive application. 

2. Some anticipated, specific outcomes in the 
field of supervision were focused on questions 
such as the following: 

a) Can teachers’ values and attitudes rela- 
tive to education be effectively changed 
through involvement in cooperative group 
effort in curriculum improvement? 

b) Will the changed values and attitudes be 
reflected in the curriculum? 

c) What are the elements in a group situa- 
tion that help people work together har- 
moniously? 

d) What conditions are conducive to stimula- 
ting people to be creative in a group pro- 
ject? 

e) What factors help or hinder communi c a- 
tion among people in a group? 

f) Are“xey” people needed to initiate, devel- 
op, and maintain a group as it ,asses 
through the various stages of devel op- 
ment? If so, who are they? What ident- 
ifies them? 

3. In regard to working on a departmental lev- 
el, some evidence should be of value to depart- 
ment heads and others interested in working to- 
ward change through the departmental unit. 

4. For teachers and others interested in at- 
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tempting further change of the curriculum inthe 
English or Language Arts area, this Study 
should give insights to the following: 

a) What aspects of the English curriculum 
do the teachers rate ‘‘most important’’? 

b) Are teachers’ differences more appar- 
ent than real in their view and practice 
of teaching English? 

c) Can ‘“‘pilot groups’’ in one or more 
schools effectively influence curriculum 
patterns in other schools of the same 
system ? 

The study succeeded, we believe, in making 
concrete the vague ‘‘intangibles’’ that comprise 
what is known as a “‘group’’. Communication 
between people was one of the common factors 
examined directly and indirectly. It would be 
well for the reader to remember at all times the 
importance the writer gave to the communication 
factor throughout the study. 


Limitations of the Study 


Generalizations cannot be made from this 
study to any other population but that involvedin 
the project. This is a descriptive study, one 
which employs a case-study approach. The four- 
teen teachers and administrators involved as 
well asthe pupils with whom the teachers worked 
are the limits of the population to which general- 
izations can be applied. 

Because it did utilize the case-study approach, 
however, the results can serve as an indication 
of what could be expected of the work-group- 
conference method, action research, etc., under 
reasonably similar circumstances. 

Another limitation arises from the fact that 
the coordinator-recorder and the teachers were 
obviously subject to error in recording, trans- 
posing, documenting and interpreting data. This 
was constantly and continuously checked, how- 
ever, and all minutes and other written data 
were examined by the teachers at every group 
meeting and approved by them. 


The Participants— Teachers 


There were originally eleven teachers in the 
project, all teachers of English but some with 
specialized jobs within the department. Two 
taught ‘‘general language’’ in addition to English. 
One taught journalism and acted as adviser to the 
school newspaper. A fourth member had a ‘‘ra- 
dio workshop’’, a complete broadcasting studio 
which held regular daily classes. This teacher 
was also the building audio-visual chairman. A 
fifth teacher was actually a member of the Social 
Studies Depart ment but taught remedial reading 
in the English Department. Another type of dis- 
tinction among them was the grade levels taught. 
One particular teacher handled seventh grade 


English classes only, while another taught exclu- 
sively ninth graders. 

The age range of the members (an all white 
group) was from 22 to over 60 years. Teaching 
experience varied from one to more than 40 years. 


Participants—Leadership and Administration 


There were four people involved in the study 
who by their formally designated positions had to 
assume leadership and administrative responsi- 
bility for the total project. These people were 
the Supervisor of Language Education, official 
representative from Division of Instruction, De- 
troit Public Schools; the writer who served as co- 
ordinator and group recorder; the school princi- 
pal who was actually ex-officio chairman of the 
group and through whom most of the group’s de- 
cisions affecting‘curriculum changes and class- 
room experiments had to be cleared; andthe head 
of the department in Urban School. The latter 
worked with the coordinator in forwarding com- 
munications to the group, making emergency ad- 
justments. of meeting schedules, and in making 
available time or materials needed by teachers 
carrying out their individual research projects. 

The Director of Language Education, though 
never deeply nor personally involved in the pro- 
ject, approved it formally. His assisant, the 
Supervisor of Language Education, was the real 
administrator-participant, however. It was she 
who helped plan, formulate, and direct the pro- 
ject and set at least part of its major goals. 

The coordinator’s responsibilities included 
initial planning with other group administrators, 
arranging for meetings, finding clerical assis- 
tance, keeping the group informed and directed 
and, finally, leading the group to reporting and 
evaluation. 


The Purposes of the Study Group as They 
Were Perceived by Administrators of the 
Project—A Restatement 


Initially the administrators perceived the 
goals of the group differently than did the teach- 
érs. Below is a specific restatement of purposes 
of the group as seen by the coordinator, supervis- 
or, principal, and department head. 


1. To stimulate teachers, by largely non-di- 
rective means to organize, conduct and 
evaluate a curriculum. research project 
within their own department in their own 
school. 

To develop in teachers abilities and in- 
sights related to: a) appreciation of util- 
izing research methods; b) applying, test- 
ing, and evaluating results of their own and 
others’ research in the classroom s; and 
c) values of cooperative effort to improve 
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the curriculum of their own department; 
d) perception of themselves and others 
from the view of their own values, skills, 
and attitudes as these influence the curric- 
ular experiences of pupils. 
To develop leadership abilities of the teach- 
ers and to bring them closer to the reali- 
zation that leadership shifts ina group 
situation from one person to another. 

. To motivate teachers to creativity both in 
the group and in the classroom. 

. To emphasize that research ability is ade- 
velopmentally acquired skill and encour- 
age teachers to work at it. 


Purposes of the Group as Proposed for the 
Teachers by Administrators of the Group 


1. To improve the instructional program in 
English at Urban Junior High School. 

2. To try new methods, materials, and tech- 
niques in the classrooms. 

3. To understand better existing methods, 
practices, materials, and techniques. 

4. To contribute, ultimately, the results of 
the group’s work to a new Curriculum 
Guide in English for Junior High Schools. 


Procedure 


The project was organized around two focal 
points, in terms of its operation: 1) regular 
group meetings held after school at Urban Jun- 
ior High every second week for two consecutive 
semesters, and 2) the carrying out of instruc- 
tional change—experiments, tests, re-examin- 
ation of established methods and materials—by 
the teachers with their pupils in the regular 
classroom situations. The biweekly. meetings 
were aimed at planning, discussion, and evalua- 
tion leading always to application or modifica- 
tion of the classroom research being done by the 
teachers. 

Early in the project it was decided by the 
group that the coordinator would be doing a mu- 
tual service for himself and the group by acting 
as recorder. The notes or minutes of each 
meeting, as well as other material needed by. 
the group, were then hectographedat the home 
of one of the group members. This person hec- 
tographed almost all the materials which grew 
out of the group project. 

The coordinator was present at every meet- 
ing (eighteen in all) of the group. The principal, 
department head, and the supervisor were not 
always present, and when they were did not al- 
ways stay for the entire meeting. This was es- 
pecially true of the principal who felt, for atime, 
that his voluntary withdrawal fromsome meet- 
ings might ‘‘free’’ the teachers froma restraint 
often existent when a status person is in the group. 
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Periodically, end-of-meeting evaluation slips 
were filled out by teachers and other types of 
data collected from them. Whenever such data 
were requested it was made explicit that the data 
would be used both for feedback to the group and 
for the writer’s dissertation. 


SECTION Il 


Interpretation of Data2 


THE FOCUS of the data-gathering instruments 
was on tracing, (1) developmental growth of the 
participants in skills and insights related to inter- 
personal relations, (2) research abilities and ap- 
preciations, (3) changes in self-perception 
(4) awareness and understanding of the group’s 
power structure, (5) evaluation skills, (6) com- 
munication patterns, and (7) appraisal of methods 
and materials in the English curriculum of Urban 
Junior High School. 

The interpretation of the data was done by var- 
ious means. Very little of the data could be 
quantified or measured by existent mathematical 
and statistical methods. For example, resis- 
tance to an idea or toaperson might be expressed 
in many ways: facial expressions, verbal re- 
sponse or withdrawal, bodily movement, degrees 
of hostility or enthusiasm, etc. For these kinds 
of data, direct observation and subsequent inter- 
pretation by the coordinator were utilized. 

The Anecdotal Records of Group Meetings 
were interpreted by presenting a verbatim ac- 
count of each meeting and analyzing the state- 
ments made against the background of the total 
context of that meeting, the total project, the 
participants themselves, and the ‘‘natural social 
group’’ factors in a field situation. Also, these 
meetings included and were influenced by activi- 
ties of the participants: discussion, planning, pre- 
senting reports, giving research findings, etc. 
What was said in the meetings was frequently com- 
pared to what was actually done. 3 

Comparison, then, was a useful tool in the in- 
terpretation of data. Various kinds of data were 
compared. Some examples of these include the 
following: 


1. Comparison of each member’s ‘‘Statement 
of Personal-Professional Purpose for Participa- 
tion in the Project’’ (written earlyin the study) 
with ‘‘Description of Individual Research Pro- 
jects’’ and ‘‘Self-Evaluation Statements’’ (the lat- 
ter two statements made at the end of the study). 

2. As evaluation of the project was constant 
and continuous bimonthly sources of data like 
‘‘End-of-Meeting Evaluation Slips’’ and End-of- 
Meeting Consolidation Sheets’’ were com pared. 
These were further compared to each member’s 
behavior in the group and to his reported re- 
search and teaching activities between meetings. 
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3. Oral and written reports by individual 
members were compared to their original “Prob- 
lem Selection” and “Statement of Purpose”. 

4. Each member of the group was asked to 
answer, in writing, the following questions after 
the first half of the study was completed: 


‘‘What data, ideas, opinions or impres- 
sions have you gained from a) the particu- 
lar project you selected, and b) what effect 
have the study and the conferences had up- 
on your approach to your teaching prob- 

lems?’’ 


The responses to these questions were alsocom- 
pared to items 1-3 above. 


5. Data on the group’s power structure were 
obtained by an “‘Opinionaire’’,4 a projective 
type instrument. The first time the members 
completed this instrument they were asked to re- 
spond as ‘‘a typical member of the group’’. Ev- 
idence showed they had not answered as a ‘‘typi- 
cal’? member but in reality had projected their 
own personal opinions. At the request of the 
members the instrument was presented a second 
time, two weeks later. On this occasion each 
member requested to be allowed to answer with 
his own opinion, not that of a hypothetical ‘‘typi- 
cal’? member. After completion of these opinion- 
aires (using the same items), comparison was 
made to the first responses. 


There were 36 questions in each opinionaire. 
Each of the nine teacher members completed 
each opinionaire fully on both occasions. Inonly 
nine instances of a total of 648 responses were 
there any absolute changes in response. None 
of these differences were statistically signifi- 
cant at the five percent level. 

Interpretation of data was, in summary, ac- 
complished by direct observation, nonquantified 
content analysis of written materials, compari- 
sons among a variety of written material, be- 
tween written material and verbal responses 
and/or behavior in the group as well as between 
meetings. Further, analysis of writtenand verb- 
al data was compared to data on the action level. 
Emotional responses, attitudes and values, as 
well as skills were singly and as a highly inter- 
related group constantly scrutinized for patterns 
of change and growth as well as for isolation of 
particular instances of change. 


SEC TION IV 
Findings of the Study 


EARLIER IN this report, five major hypoth- 
eses were listed. These hypotheses are restat- 
ed here. The results of the testing of each hy- 


pothesis follow each one of them respectively. 


Hypothesis A: There will be tangible modifca- 


tions in classroom practices of the teachers 
involved in the study. 


Evidence Relating to Hypothesis A— For the 
greater part of the study there were nine teachers 
plus a teaching department head involved in this 
study. Based on reports (oral and written) of 
each of these participants, the evidence of tangi- 
ble modifications is clear-cut and definite in seven 
of the ten participants’ classrooms. In the case 
of the eighth member, changes were less than the 
writer had expected while in the classrooms of the 
ninth and tenth members changes were very few 
or none. 

Some examples of modifications were the fol- 


lowing: 


The initiation of English-social studies core 
classes in the classrooms of two of the 
members. 

A systematized, extensive penmanship pro- 
ject in the classroom of one of the members. 
Re-evaluation of the purposes of testing chil- 
dren on the part of a member. 

Utilization of personality inventories or 
problem checklists. 

Planning, developing, recording and evalu- 
ating a ‘‘new’’ spelling program. 
Examination of pupils’ reading comprehen- 
sion abilities and subsequent adjustment of 
the reading program. 

Re-organization of classroom management 
procedures leading to greater efficiency 
and more ‘‘teaching time’’. 


Hypothesis B: A variety of instructional methods 
Will be tried and tested. 


Evidence Relating to Hypothesis B—In addition 
to the evidence cited under Hypothesis A above, 
it should be noted that Hypothesis B was also, in 
the main, supported. 


1. The department head worked with two stu- 
dent teachers and developed the unit plan 
approach. 

2. Another member tried two different meth- 
ods of teaching composition skills. 

3. A third member tried and tested a method 
of improving use of the dictionary. 

4. Still another applied ‘‘phonics’’ principles 
to reading and writing skills andcompared 
these to a method where phonics were vir- 
tually unmentioned to the pupils. 

Another member switched from the arbi- 
trary teaching of formal grammar to a 
method based on greater ‘‘individual anal- 
ysis’’ and a way of teaching the ‘‘most 
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necessary skills’’. 

Hypothesis C: There will be an increase in the 
confidence of teachers in defining problems. 
Evidence Relating to Hypothesis C—There 

was a gradual and, at times, almost impercep- 

tible increase in confidence of teachers in defin- 
ing problems. For many weeks and months the 
teachers kept examining methods, materials, 
techniques, classroom load and other problems 
external to them. From this they moved to de- 
fining problems in the children—their growth, 
personality, learning and cultural problems. 
Then the teachers began facing problems re- 
lated to teacher-administrator relations. Some 
of the participants actually got to a point where 


they were examining themselves, analyzing their 
own motives, values and attitudes. 


Hypothesis D: Teachers will feel inc reasingly 
secure in exchanging suggestions with each 
other. 


Evidence Relating to Hypothesis D—One has 
but to examine the Minutes of Group Meetings or 
the statements made in the Progress Report and 
the Final Report of the group to trace the in- 
crease of candidness among the members. M7 
and M5 as well as M3 and M$ stated explicitly on 
more than one occasion that knowing the other 
teachers were facing the same problems as they 
themscives were facing made them feel more 
free to ask for help and suggestions. 

During group meetings, especially during 
Phases Three and Four, members did not hesi- 
tate to say, ‘‘Why don’t you try this?’’ or ‘‘I’ve 
used this technique and it worked under some 
conditions. Why don’t you give it a try?’’ 

Also, as time went on, the principal, super- 
visor and coordinator felt more free to suggest 
techniques and methods. The department head, 
because of her closeness to the teacher - mem - 
bers, was able to offer suggestions tothe group 
from the very beginning. The pattern of re- 
sponse to her suggestions changed as the mem- 
bers became more ‘‘group’’ oriented. 


Hypothesis E: The Head of the English Depart- 
ment and the Principal will allow and encour- 
age teachers to try, test, and develop newer 
methods, techniques, and courses. 


Evidence Relating to Hypothesis E—Minutes 
of group meetings show that the principal repeat- 
edly encouraged teachers as stated in Hy pothe- 
sis E. 

It was due to his support that the department 
head later worked core into the English curricu- 
lum. « The principal supported most suggestions 
made by members of the group and hel pedthe 
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members try different methods. The writer be- 
lieves that the principal’s contributions were the 
most beneficial ones coming from an administra- 
tor involved in the project and his positive atti- 
tude did contribute a great deal to the life and 
value of the study. 

The evidence for Hypothesis E seems conc 1 u- 
sive to the writer as far as the principal is con- 
cerned, The evidence, on the other hand, toshow 
that the department head ‘‘allowed and encour- 
aged teachers to try, test, and develop newer 
methods, techniques, and courses’’ is inc onclu- 
Sive at this time. 

There is one thing, however, which this study 
did demonstrate most pointedly. It is related to 
the role of the department head (inthis study) but, 
more specifically, to members’ use of research 
materials. 

Only twice during this entire study did a mem- 
ber actually utilize the research summaries (find- 
ings of experts, resource materials, or reports 
of research in progress) that were brought in to 
the group by the coordinator. In each of the two 
instances that such material was used it was by 
the same member. On both occasions the mater- 
ial related to penmanship. On the other hand, 
never during the entire forty-week period did any 
member give indication that he ever utilized the 
hectographed reprints of research findings (as 
prepared by the coordinator and the group’s sec- 
retary. As far as the evidence of this study 
shows, such research materials, made easily 
and conveniently available to all members of the 
group, did not affect or modify the teaching prac- 
tices of any group member. 

Because the department head was strongly 
‘‘for’’ the practice of inviting experts in to ad- 
dress the group, it seems significant to the writ- 
er that the department head as well as the other 
group members never took advantage of research 
summaries and articlesinprintedform. It might 
well be that the opinions or findings of experts as 
presented tothe group in this study were actually 
as ineffectual as the written research materials 
which the members were given. Again, this kind 
of teacher-reactionseemstobear out the two 
major assumptions made at the outset of this 


study. 


Three Questions Related to the Five 
Major Hypotheses 


1. What are the strong points of the action 
research method and the work-group-con- 
ference technique? 


In the Urban study the most outstanding contri- 
bution of the action research method and employ- 
ment of the work-group-conference technique was 
that classroom instruction was improved. The 
improvement varied from teacher to teacher 
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over the thirty-eight week period but all of the 
teachers stated that the project had afforded 
them opportunities to improve classroom instruc- 
tion. Some examples of teachers’ statements 
are presented below: 


a) ‘‘My opportunity to teach core came as a 
direct result of our meetings. ’”’ 

b) ‘‘My study was on Remedial Penmanship. 
It will culminate in a new Handwriting 
Scale. It has solved for me the question 
‘How can I improve the penmanship of my 
pupils?’ and has answered questions of 
many years’ standing. ’’ 

c) ‘‘I have found that many of the ideas and 
methods which I have used for the last 
eight years are basically sound. The con- 
ferences have motivated me, vexed me, 
and defeated my tendency toward laziness 
in educational theory. My obsessionof be- 
ing a scholar has given way toone of being 
an outstanding teacher. ”’ 

d) ‘‘I have gleaned many techniques (of in- 
struction) in the past year from our discus- 
sions that probably would have taken years 
to discover by myself, if ever...1 was be- 
ginning to think that my own teaching situ- 
ation consisted of the four walls of my 
classroom but the discussions caused me 
to realize more forcefully that education 
is a process of the whole school. ’’ 

e) ‘*...1 incorporated ideas heard at meetings 
in my teaching...the conferences de- 
veloped a liberal attitude within me to ex- 
periment and find better teaching methods; 
this makes a better teacher. ’”’ 


Besides the explicit statements of the teach- 
ers regarding improved teaching in their class- 
rooms the Progress Reports, Minutes of Group 
Meetings, Consolidation of End-of-Meeting Eval- 
uation Slips, and the Final Report all show that 
teachers were thinking about ways to improve 
classroom instruction. The experiments de- 
scribed in the Final Report reflect much thought, 
work, andeffort onthe part of most teachers but 
the writer believes that the Minutes of Group 
Meetings reveal a much wider and deeper kind 
of effort and growth. In the latter records can 
be found evidence of teacher-growth during the 
entire study project period in broadened vie w- 
points, more scientific approach to problems, 
more social insight, greater personal interest 
in teaching problems, an increased desire to 
look at children as individuals, and a constantly 
developing awareness that their teaching could 
be improved. In cases where teacher- growth 
was less than expected, it seemed to the writer 
that many causes could have been operative. 
Among these was the specific problem of ‘‘un- 
favorable group conditions’’. 


2. Under what conditions can these methods 
best be used in creating curriculum change? 


In the original report of this study, it was 
pointed out that certain conditions are necessary 
for 1) agrouptodevelop and operate harmonious- 
ly, and 2) people in a group to be stimulate to- 
ward creative participation. Good channels and 
methods of communication are discussed inthe 
report and also the importance of key people. In 
the Urban study the single most important condi- 
tion which was necessary for the success of the 
project was that of ‘‘support’’. Assuming that 
the curriculum of a given junior high school can 
profit from the concerted effort of a number of 
teachers working at it cooperatively, the evidence 
of the Urban study indicates that is of utmost im- 
portance that: 


a) The department head (in a departmentalized 
junior high school) is fully and completely 
in accord with the idea of trying to do such 
a project. This kind of project will, in all 
likelihood, not succeed unless this support 
is constant. 

b) The principal must not merely ‘‘allow’’ it 
but be active in lending it support by giving 
it the added aura of his prestige and active- 
ly serving it by backing teachers’ decisions 
relative to curricular improvement. Hecan, 
furthermore, be of greater service by par- 
ticipating in group sessions when his pres- 
ence will be a positive force and his contri- 
butions (valuable due to experience and spe- 
cial knowledge) will enhance the work of 
the group. 

c) The supervisor, like the principal and de- 
partment head must accept and support the 
idea of action research. In order of prior- 
ity of value to the project (influence, status, 
decision-making), the writer feels the rank- 
ing is department head, principal and, last- 
ly, supervisor. 

d) Enough time must be allowed. This means 
that some teacher-release time should be 
made available and also that the total length 
of the project be allotted a time perioa ap- 
propriate to the growth and development of 
the group of participants. To ‘‘cut off’’ the 
project before the group has completed its 
whole job might mean the negation of much 
that has gone before—the work, time, ef- 
fort, and even, in some cases, isolated 
‘tislands of success’’ in the ‘‘imp roving”’ 
curriculum as well as human relationwise. 
A premature forced stop might well be one 
of the worst things that could happen. 


What are the limitations of the methods and 
application of action researchandthe work- 
group-conference technique? 
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The Urban study strongly indicated that ini:i 
and continuing support of the project by the prin- 
cipal and department head was essential. With- 
out their initial support the study could not have 
begun. Weaknesses and snags in the study usu- 
ally occurred either directly or indirectly as a 
result of a decrease of support by either mem- 
ber. On the other hand, when strong support 
was forthcoming the morale and production of 
the group increased. The limitations of action 
research and the work-group-conference tech- 
nique in this study came about through lack of ad- 
equate support. This lack of support manifested 
itself in several ways and these comprised the 
limitations of the methods and their application. 


a) The Urban project received no financial 
support either from the school fund, the 
English Department fund, or from the sys- 
tem-wide treasury. 

b) Teachers received no release-time for 
meetings of the group. Everything was 
done on their own time after school. 

c) The coordinator and group secretary (a vol- 
unteer for the job) did all recording of min- 
utes, transcribing, typing, andhectograph- 
ing. In addition, many letters, brochures, 
reprinted resource materials, and other 
correspondence related to the group were 
typed, hectographed, and mailed by the co- 
ordinator and group secretary. 

e) Although there is undisputable evidence that 
the majority of the group was in favor of 
continuing the study for a third semester, 
this opportunity was lost. 


Another ‘‘limitation’’ of action research as it 
was done in this study is that no concise re- 
search design can be formulatedlong in advance 
of initiation of the project. The design evolves 
from the ongoing project. This is certainly an 
initial limitation but it need not be a continuing 
detriment throughout the life-span of a project. 
It is, rather, a ‘‘developmental’’ characteristic 
of the method. If the group and the individuals 
in it proceed in a typical fashion, designs should 
become increasingly sharper and more sophisti- 
cated. At Urban some teachers moved from what 
we might characterize as the lowest points ona 
‘‘research competence’”’ scale to relatively high 
points on the scale in less than forty weeks. 


Conclusion 


The study at Urban Junior High in Detroit 


was an honest attempt to test, in a field situa- 
tion, the effectiveness of the work-group-c on - 
ference method as a technique in curriculum 
change and improvement. Over a forty-week 
period the writer saw the teachers’ values, atti- 
tudes and skills relative to education change 
enough to permit acceptance and development of a 
democratic, research-oriented way of doing 
things. 

It would be unrealistic to say that ‘‘great’’ cur- 
ricular changes occurred because of the project, 
but, on the other hand, it was never our expecta- 
tion that changes should or would be of great mag- 
nitude. The terminal point of our study should 
have been, we believe, the ‘‘new’’ point of de- 
parture for the teacher group in further explora- 
tion of the curriculum and themselves. The in- 
sights and skills acquired by each teacher, how- 
ever, during the study may benefit some future 
group in advancing research and cooperative 
action. 
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FOOTNOTES 


1. Examples of all the data- gathering instru- 


ments are shown in the Appendix of the origi- 
nal dissertation. 


. Within the limits of this report it is possible 
to give only brief samples of how data were 
interpreted. For complete examination of da- 
* ta analysis see the original dissertation, 
Chapters IV-VII and Appendix, pp. 351-451. 


3. There is much evidence in the group’s 75 


page Final Report to indicate that all mem- 
bers moved definitely to the action level. An 
abstract of the Final Report is in the Appen- 
dix of the origital dissertation. 


. See Chapter VII, pp. 282-95, for a-com- 


plete analysis and Appendix of disserta- 
tion. 
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LITERAL AND CRITICAL READING 
IN SOCIAL STUDIES 


E. ELONA SOCHOR 
Temple University 


The Problem and Its Scope 


THE PURPOSE oi this study was to investi- 
gate certain aspects of reading comprehension 
among fifth-grade pupils. In order to explore 
the problem, it was necessary to construct and 
validate an intermediate-grade reading test in 
social studies. The specific problems consid- 
ered were: 


1. What is the relationship between verbal intei- 


ligence and 
a. ‘‘General’’ reading ability? 
b. Achievement in literal reading compre- 
hension in social studies? 
c. Achievement in critical reading com- 
prehension in social studies? 


2. What is the relationship between ‘‘general’’ 
reading ability and 
a. The ability to comprehend literally in 
social studies? 
b. The ability to comprehend critically in 
social studies? 


3. What is the relationship between proficiency 
in literal and critical interpretation of social 
studies ? 


4. What is the relationship between proficiency 
in each selected critical reading skill and 
the ability to comprehend literally in soc ial 
studies? 


Justification of the Study 


To date, the concepts of the measurement 
and development of reading comprehension as 
held in the 1920’s are still widely evident at all 
educational levels. Reading tests, largely lim- 
ited to the appraisal of ‘‘sense-meaning’’ and 
weighted with materials from the field of litera- 
ture, are used commonly to determineall read- 
ing needs. Little attention is being given to 
critical reading skills in study situations. In 
practice, reading tends to remain a ‘‘unitary 
ability. ’’ 


Such a concept of reading ability is no longer 
tenable. Conclusive data from studies by Dewey 
(11), Tyler (32), Thorndike (30), and Davis (10) 
indicate that adequate reading comprehension ne- 
cessitates not one but several levels of mental 
functioning. Although this premise is well estab- 
lished, much research still needs to be conduc- 
ted on the more specific aspects of reading com- 
prehension. 

Moreover, the skills and abilities characte r- 
istic of effective reading interpretation are not 
the same in all content areas. The specificity 
of skills within subject-matter areas at the sec- 
ondary and college levels has been substantiated 
by Shores (27) and Humber (18). Further data 
are needed on the nature of these skills in each 
content area, particularly at the elementary 
school level. 

Another major reading problem in education 
today is verbalism. Too much emphasis has 
been placed upon a low-level type of interpreta- 
tion. Retardation in reading has been deter- 
mined too frequently in terms of wordperception 
alone. As early as 1921, investigators reported 
deficiencies in the reading comprehension of 
some school children. Although these children 
could reproduce what they had read with a ‘‘par- 
rot-like precision, ’’ they had little real under- 
standing. Since then, many educators and re- 
search workers have stressed the importance of 
this problem (4,17,25). Nevertheless, verbal- 
ism in reading still appears to be rampant in 
every phase of school activity. To help resolve 
this situation, measures of appraising and tech- 
niques for developing the various aspects of lit- 
eral and critical interpretation in reading need 
to be investigated. The major solution, however, 
rests with the schools. Effective reading com- 
prehension must be emphasized in all reading ac- 
tivities. 

The importance of reading comprehension is 
not limited to school life. The ability to inter- 
pret what is read on current events is vital to the 
preservation of democracy. In such a social or- 
der, comprehending what is stated directly isa 
prerequisite. Mere literal interpretation, how- 
ever, is not sufficient. The citizen must be 


* Abstract of Doctor of Education thesis, Teachers College, Temple University. 
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skilled in evaluating critically the wealth of 
available printed materials. 


Limitations of the Study 


Experimental Design—The purpose of this in- 
vestigation was to study the relationships be- 
tween intelligence and three types of reading 
ability: ‘‘general’’ reading, literal inte rpreta- 
tion, and critical interpretation. Final data 
were obtained on a representative sample of five 
hundred thirteen fifth-grade pupils. To obtain 
these data on reading skills in social studies, it 
was necessary to construct and validate the ex- 
perimental edition of a reading test in that con- 
tent area. A group test of verbal intelligence 
and a standardized reading test were used to help 
in estimating the normality of the distribution. 

Statistical Design— The results were ana- 
lyzed with large sample techniques which includ- 
ed the Chi-square test for presence or absence 
of relationship, and the product-moment and 
point-biserial methods of correlation. The read- 
ing test in social studies was evaluated by means 
of one technique estimating test reliability and 
three techniques estimating item validity. 

The Population—A total of six hundred elev- 
en pupils were tested. Complete results ob- 
tained on five hundred thirteen subjects were 
used in the study. 


1. Source: Eighteen fifth-grade classes were 
tested in June 1949. Nine of these were 
from four suburban Philadelphia schools, 
and nine from four urban Philadelphia 
schools. 

2. Age: The chronological age range was from 
10-0 through 14-6. 

. Sex: Boys and girls were tested. 

. Race: White and Negro children were includ- 
ed. 

. Intelligence: Verbal intelligence quotients 
ranged from 57 through 158. 

. Reading Grade: The reading grade ranged 
from minus 2.5 through 10. 3. 

. Final Population Criterion: Two hundred 
sixty-nine cases of the final population fell 
within plus or minus one standard devia- 
tion from the mean on the two criterion 
measures—intelligence and reading grade. 


Tests 


The Gates Reading Survey, Form I, Level of 
Comprehension (Published by Teachers 
College, Columbia University), was adminis- 
tered as a power test in ‘‘general’’ reading com- 
prehension. The Experimental Edition of the In- 
termediate Reading Test, Social Studies, was 
used to appraise literal and critical interpreta- 
tion in social studies. The Pintner General 


Ability Test, Form A (published by World Book 
Company), was used as a measure of verbal in- 
telligence. 


Definition of Terms 


The following terminology is basic to this 
study. For purposes of clarity, literal reading 
comprehension and the selected critical reading 
comprehension skills will be illustrated as well 
as defined. In each example, the correct re- 
sponse for the test item will be the first, andone 
of the distractors will be the second. 

‘*Literal Reading’’ represents the ability to 
obtain a low-level type of interpretation by using 
only the information explicitly stated. For exam- 
ple, the selection states, ‘‘Millions of workers 
dragged stone blocks for the outside walls and 
packed basket after basket of earth between them.”’ 
The test item appraising literal interpretation of 
this sentence is: ‘‘The outside walls were made 
of (1) stone, (2) earth. . .”’ 

‘‘Critical Reading’’ represents the ability to 
obtain a level of interpretation higher than that 
needed for literal interpretation. In this study 
the following critical reading comprehension 
skills were set up: 


1. Functional Vocabulary tests the reader’s 
background of experience in reference toa 
concept used in the selection. 

2. Semantic Variation of Vocabulary tests the 
reader’s ability to identify a similar usage 
of a given word from the selection. For ex- 
ample, the word ‘‘beat’’ is employed inthis 
manner ina story: ‘‘Every day cruel slave 
drivers beat these workers. . .’’ Thetest 
item is: ‘‘The sentence which uses the word 
beat just as it is used inline 26 of the story 
is (1) Mother said, ‘Beat the rug until itis 
clean.’ (2) The policeman’s beat was sev- 
eral miles long. . .”’ 

3. Central Theme tests the ability to distin- 
guish the central topic of the selectionfrom 
subordinate ones. An example is: ‘‘This 
story as a whole is about (1) the largest 
wall in the world, (2) the early emperor of 
Cam...” 

4. Key Idea tests the ability to indentify the key, 
or most important, idea in the story. One 
test item is: ‘‘The most important idea in 
the story is that the Great Wall was (1) act- 
ing like an army, (2) used as a highway. .”’ 

5. Inference tests the ability to draw a specific 
conclusion indirectly from the material 
given. For example, the first selection 
discusses the need for the Great Wall and 
then states, ‘‘It was longer than 1500 
miles, more than half the distance across 
our own country.’’ The test item is: ‘‘The 
Emperor of China needed the Great Wall 
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because China (1) was too large to protect 
with soldiers, (2) had only a few soldiers 
who rode on horseback. . .’’ 

. Generalization tests the ability to identify a 
general conclusion or principle indirectly 
from information implicitly stated. An ex- 
ample is: ‘‘From the story we should be- 
lieve that ALL (1) buildings thatlast have 
been built carefully, (2) workers of China 
are better than the workers in our country 


. Problem Solving tests the ability to apply 
information from the selection to a prob- 
lematic situation. One test item is: ‘‘Mrs. 
Brown paid twenty-five cents for a can of 
peaches. She said, ‘This is howthe farm- 
er gets rich.’ She was wrong because (1) 
the farmer gets only a part of the twenty- 
five cents, (2) farmers get rich from dairy 
products. ..”’ 

. Association of Ideas tests the ability to see 
the relationship among ideas in a series. 
For example, ‘‘The row with ideas from 
the story that belong together is (1) fierce, 
cruel, savage; (2) enemy, builders, horses, 


. Analogy tests the ability to perceive rela- 
tionship between two pairs of ideas. The 
idea which completes an established rela- 
tionship is identified: ‘‘Stones are to build- 
ing as people are to (1) nation, (2) houses 


s Antecedent tests the ability to recognize the 
word or words to which a selected pronoun 
refers. For example, ‘‘The word them 
in line 25 of the story refers to (1) outside 
walls, (2) people of China. . .’’ 

. Sequence tests the ability to determinea 
time sequence. One test item: ‘‘Below is 
a story about how certain vegetables reach 
the store. The first idea out of order is 
(1) The vegetables are canned, (2) the veg- 
etables are processed. 

. Extraneous Idea tests the ability to deter- 
mine relevancy of ideas to a particular se- 
lection. For example, ‘‘The idea NOT 
found in the story is that many(1) people 
were buried in the Great Wall, (2) emper- 
ors used the wall for protection. . .”’ 

. Author Purpose tests the ability to identify 
the author’s primary motive in writing a 
given selection. One test item is: ‘‘The 
author wrote this story because he thinks 
we should know (1) about great things in 
other countries, (2) about the enemies of 
.” 


‘Survey Reading Comprehension’’ is ameas- 
ure of understanding based on the results of a 
reading test which uses content largely from the 
field of literature. The comprehension section 


of the Gates Reading Survey was used in this 
study. ‘‘General’’ Reading Comprehension is 
used synonymously with ‘‘survey’’ reading com- 
prehension. 

‘‘Verbal Intelligence’’ is a measure of capac- 
ity which is obtained from a test that usually re- 
quires a high degree of language facility both in 
understanding directions and in the subject’s re- 
sponses. 


A Review of Kindred Literature 


Although most of the research on reading com- 
prehension and test construction has been conduct- 
ed at the secondary or college levels, investiga- 
tions at the elementary level tend to confirm the 
conclusions indicated in the research at the high- 
er levels. Accordingly, the pertinent con- 
clusions from all the studies are summar- 
ized interms of two major areas: critical 
reading comprehension andtest con- 
struction. 


Critical Reading Comprehension 


In 1917 Thorndike published three articles 
emphasizing the premise that reading is a think- 
ing process (29, 30,31). Since that time, educa- 
tors have been concerned not only with the ‘‘sense- 
meaning, ’’ or literal comprehension of printed 
material (14, 34), but also with a more thorough 
interpretation, or critical comprehension. Crit- 
ical reading comprehension has been defined as 
critical thinking in reading situations (4). 
Critical Thinking. — Since critical thinking is 
basic to critical reading comprehension, a sum- 
mary of the conclusions in the research on crit- 
ical thinking is pertinent to this investigation: 
1. Critical thinking necessitates the function- 
ing of higher level thought processes (3,30). 

2. Critical thinking appears to be a complex 
of component abilities, some of which seem 
to have been identified (3,12, 13, 32). 

. The manifestation of intelligence does not 
guarantee the ability to think critically (3, 
13). 

. The ability to think critically in one content 
area cannot be assumed to indicate thatthe 
same is true in another (25). 

. Aspects of critical thinking can be meas- 
ured by paper-and-pencil tests (13,32,33). 

. Certain aspects of critical thinking can be 
improved by instruction (13, 26, 33). 

. Fifth-grade children can think critically. 
Moreover, the difference between their 
ability to reason and that of adults is mere- 
ly a quantitative one (8, 16, 21). 


Critical Thinking in Reading Situations—T he 
research on critical reading comprehension re- 
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veals: 


1. Critical reading comprehension has the same 
attributes as those stated above for critical 
thinking, but they apply when critical thinking 
is done in reading situations (4,10, 11,12, 13, 
26, 30, 31). 

. Literal reading comprehension appears to ne- 
cessitate mental functioning of a lower level 
than critical reading comprehension (3, 11,14, 
32, 34). 

The ability to comprehend critically cannot be 
predicted from the ability to comprehend liter- 
ally, or factually (3,11, 12, 25, 32). 


Test Construction 


The need for better test measures at the ele- 
mentary level has been stressed repeatedly inthe 
literature (4,7,19). The following list of charac- 
teristics includes the major suggestions from per- 
tinent literature. 

Readability—In constructing a test, the author 
should consider the two aspects of readability (4, 
9, 22): 

1. The reader - his experience, interests and 

language facility, 

2. The material - the interest level, the lan- 

guage, the concepts, and the mechanical 
features. 


Reading in the Content Areas—This aspect of 
reading has significant implications for test con- 
struction: 

1. Since reading skills vary between content 
areas and success in reading the materials 
of one content area cannot be used as a cri- 
terion of success in another content area, 
test materials should be built from mate r- 
ials within a given content area (6,7, 18, 27). 
Since reading is a complex of many abili- 
ties and skills which vary within one con- 
tent area, specific skills should be ap- 
praised (3, 10,11, 12, 18, 27). 

Since ability in critical reading comprehen- 
sion cannot be predicted from ability in lit- 
eral comprehension, a reading test should 

include both (2, 4, 15, 20, 24, 25). 


Mechanical Features—The following criteria 
are suggested in the literature for test construc- 
tion (1, 7, 28): 

1. The test materials should be valid and reli- 

able. 

2. The number of items appraising each skill 
should be large enough to show the degree 
to which the subject possesses that skill. 

3. The directions should be clear and consist- 
ent for each administration of the test. 

4. Each multiple-choice item should have: (1) 
at least five alternate responses, (2) one 


best answer, (3) the correct answer ran- 
domized, and (4) plausible distractors of 
about equal length. 


Summary of Procedure 


The following procedure was used in this 
study: 


1. Apreliminary edition of The Intermediate 
Reading Test: Social Studies, designed to ap- 
praise both literal interpretation and specif- 
ic critical reading comprehension skills, was 
constructed and validated. 

A preliminary study was conducted in which 

the test was administered to one hundred and 

forty-three children in grades four, five, 
and six. The results were used to evaluate 
the preliminary edition in terms of readabil- 
ity and the discriminating power and internal 
consistency of each test item. 

The measure was revised and called the ex- 

perimental edition of The Intermediate Read- 

ing Test: Social Studies. 

The experimental edition of the reading test 

was administered to five hundredand thirteen . 

children not included in the preliminary 

study. 

a. The reliability of the experimental edition 
was computed by means of the Kuder- 
Richardson Estimate of Test Reliability. 
The validity of each test item was evaluat- 
ed by using (1) the Standard Error ofthe 
Difference Between Proportions, (2) anes- 
timate of the product-moment coefficient 
of correlation based on the upper and low- 
er 27% of the distribution, and (3) inspec- 
tion of the total number of choices for each 
distractor. 

Two standardized tests were administered to 

the population used in the major study: The 

Gates Reading Survey (Level of Comprehen- 

sion) to appraise ‘‘general’’ reading ability 

and The Pintner General Ability Test (Verb- 
al Series) to obtain verbal intelligence quo- 
tients. 

The product-moment method of correlation 

was used to estimate the degree of relation- 

ship between the four variables: intelligence, 

‘‘general’’ reading ability, literal compre- 

hension in social studies, and critical inter- 

pretation in social studies. 

Partial correlation was used to estimate the 

degree of relationship between the three 

types of reading ability (‘‘general’’ reading 
ability, literal comprehension in social stud- 
ies, and critical interpretation in social stud- 
ies) when intelligence was partialled out. 

Chi-square was used to determine the pres- 

ence or absence of relationship between 1 it- 

eral reading and each critical reading skill 


in social studies. 

9. The point-biserial method of correlation was 
utilized to estimate the degree of relationship 
between literal reading and each critical read- 
ing skill in social studies. 


Summary of Results 


Problem I: The degree of relationship betw een 
intelligence and ‘‘general’’ reading ability, lit- 
eral reading comprehension, and critical read- 
ing interpretation in social studies, as estimat- 
ed by the product-moment method of cor rela- 
tion, was .83 + .01, .72 + .02, and .69+ .02 
respectively. 


Problem II: The degree of relationship between 
‘‘general’’ reading ability and literal and crit- 
ical reading comprehension in social studies, 
as estimated by the product-moment form ula, 
was .76 + .02 and .64+.03 respectively. 
When intelligence was partialled out, the cor- 
relation coefficients were .42 and .17 respec- 
tively. 


Problem UI: The degree of relationship between 
literal and critical reading interpretation in so- 
cial studies, as estimated by the product-mo- 
ment formula, was .61+.03. With intelli- 


gence controlled, the correlation was . 23. 


Problem IV: The degree of relationship between 
success on each critical reading skill and the 
total literal reading score was computed by 
two methods: 

a. The twenty-three point-biserial coefficients 
of correlation ranged from .45 to -. 17 with 
a ‘‘median’’ coefficient of .23 and the seven 
‘‘combined’’ point-biserial coefficients 
ranged from . 28 to .06 with a median of .26 
after those test items which failed to be sig- 
nificant on the chi-square test of signifi- 
cance were excluded. The critical reading 
skill with the greatest degree of relation- 
ship was ‘‘functional vocabulary’’ (.28), the 
skill with the least was ‘‘extraneous idea’’ 
(. 06). 

The estimates of the product-moment coef- 
ficients of correlation ranged from .61 to 
-.27 with a ‘“‘median’’ coefficient of . 25. 
Twenty-two of the critical reading test 
items appeared to be significant on the basis 
of three probability values. 


Conclusions 


Within the limitations of this study as stated 
in Chapter I (see original thesis onfile at Temple 
University) the following conclusions appear to 
be valid: 


Problem I 


Verbal intelligence appears to be very highly 
related to ‘‘general’’ reading ability (.83 + 
-01). 

Verbal intelligence appears to be substantial- 
ly related to the ability to comprehend liter- 
ally in social studies (.72 + .02). 

Verbal intelligence appears to be substantial- 
ly related to the ability to comprehend criti- 
cally in social studies (. 69 + . 02). 


Problem II 


“‘General’’ reading ability appears to be high- 
ly related to literal reading interpretation of 
social studies materials (.76 + .02). When 
intelligence is partialled out, the relationship 
appears to be substantial (. 41 + . 04). 
‘“‘General’’ reading ability appears to be sub- 
stantially related to critical reading interpre- 
tation in social studies (. 64 + .03). When in- 
telligence is held constant, the relations hip 
appears to be low (.17 + . 04). 


Problem III 


Literal reading comprehension appears to be 
substantially related to critical reading com- 
prehension in social studies (. 61 + .03). With 
intelligence held constant, the relationship ap- 
pears to be negligible (. 23 + . 04). 


Problem IV 
Each selected critical reading skill appears 
to show a negligible or low relationship to the 


ability to comprehend literally in social stud- 
ies, 


General Conclusions 


1. Reading comprehension in social studies ap- 
pears to be a composite of many skills and 
abilities which apparently function at various 
levels of mental activity. 

Literal and critical reading comprehension 
in social studies appear to be relatively inde- 
pendent abilities when intelligence is held 
constant. 

Individual critical reading comprehension 
skills appear to be relatively independent of 
the ability to comprehend literally in social 
studies. 

When intelligence is held constant, critical 
reading comprehension in social studies ap- 
pears to be virtually independent of ‘‘gen- 
eral’’ reading ability; literal reading com- 
prehension, .relatively independent of ‘‘g en- 


eral’’ reading ability. 

5. Group tests of ‘‘general’’ reading ability and 
group tests of verbal intelligence tend to 
measure common factors. 


Implications 


Several school practices need to be consid- 
ered thoughtfully if reading instruction in social 
studies is to be improved: 

1. The use of a ‘“‘general’’ reading test to 

identify all reading needs. 

2. The practice of teaching reading as a ‘‘un- 
itary’’ ability in materials taken from the 
field of literature. 

3. The use of a group, verbal intelligence 
test to estimate the intelligence of all pu- 
pils. 


A reading test appraising ‘‘general’’ reading 
ability does not identify all reading needs. By 
definition, it is ‘‘survey’’ in nature and lacking 
in specificity. Frequently it is limitedtoa low- 
level type of interpretation. Furthermore, the 
usual reading test is composed primarily of ma- 
terials from the field of literature. 

Such a test is inadequate, in the first place, 
because reading comprehension cannot be con- 
fined to the interpretation of the sense-meaning 
in literature materials. Reading is a complex 
process embracing many levels of interpretation 
and many different skills and abilities. 

In the second place, the reading skills and 
abilities necessary to adequate interpretation 
vary considerably within and between the various 
subject-matter fields. Accordingly, specific 
needs in reading comprehension, particularly in 
critical reading comprehension, should be ident- 
ified by means of informally constructed tests 
and daily appraisal during teaching sessions. 

The identification of specific needs in literal 
and critical reading comprehension is merely 
the first step in improving the reading skills of 
any school population. Developmental instruc- 
tion in reading skills and abilities needs to be 
provided systematically in all content areas. 
More emphasis should be placed upon higher 
levels of reading interpretation to avoid verbal- 
ism. The comparison of the literal andcritical 
scores in this study indicates that the pupils 
tested lacked the ability to interpret social 
studies materials critically. It is the responsi- 
bility of each classroom teacher to provide for 
the systematic training required to develop such 

abilities. 

To appraise a retarded reader’s mental ca- 
pacity by means of a group, verbal intelligence 
test is a highly questionable procedure. The 
amount of relationship between the group, verb- 
al intelligence test and the group reading test 
in this study as well as in other studies implies 
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that one can be predicted from another with con- 
siderable accuracy. Therefore, a child unable 
to read cannot perform at or near his mental ca- 
pacity level on such an intelligence test. When 
reading retardation is apparent, it is advisable 
to use an individual measure of mental capacity. 

Another major need in education is the con- 
struction of reliable and valid measures to ap- 
praise critical reading skills in all subject mat- 
ter areas at the elementary school level. Criti- 
cal reading skills are not included in content- 
area tests available now. Sucha test should 
yield a score on each critical reading compre- 
hension skill so that specific needs can be ident- 
ified. 


Suggestions for Further Research 


Further inquiry into reading comprehension 
appears to be warranted. The following prob- 
lems are in need of investigation: 

1. The relationships between, and interrela- 
tionships among the critical reading com- 
prehension skills in social studies. 

2. The investigation of other skills necessi- 
tating a higher level of interpretation in so- 
cial studies. 

3. A factorial analysis of critical reading com- 
prehension in social studies. 

4. The investigation of literal and critical 
reading comprehension within other content 
areas and between content areas. 

5. Investigations on the development of each 
critical reading comprehension skill. 

6. Studies to evaluate effective methods 
for developing critical thinking in read- 
ing. 

7. The construction of valid measures to ap- 
praise literal and critical reading compre- 
hension skills in all content areas at the 
elementary school level. 
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LITERAL AND CRITICAL READING 
| IN SCIENCE 


ETHEL S. MANEY 
Temple University 


The Problem and Its Scope 


THE MAJOR purpose of this investigation was 
to determine the relationships between selected 
factors of reading comprehension as revealed by 
fifth-grade children. In order to obtain data on 
this problem, it was necessary to evaluate the 
experimental edition of an intermediate reading 
test in science. Specifically, the problems con- 
sidered were: 


1. What is the relationship between literal 
and critical reading comprehension of science 
materials? 

2. What is the relationship between verbal in- 
telligence and 

a. ‘‘Survey’’ or ‘“‘general’’ reading com pr e- 

hension? 

b. Literal reading comprehension in science? 

c. Critical reading comprehension in science? 


3. What is the relationship between reading 
comprehension as measured by a standardi zed 
reading survey test and that appraised by 

a. A literal reading test in science? 

b. A critical reading test in science? 


4. To what degree does each selected critical 
reading skill tend to be independent of the ability 
to read literally when science materials are 
used? 


Justification of the Study 


Recent investigations have cast doubt upon 
certain traditional school practices: (a) employ- 
ing a single reading test to measure reading abil- 
ity in all situations, and (b) considering the de- 
velopment of critical reading skills as a concom- 
itant of intelligence, maturation, and normal 
school progression. 

One of the assumptions underlying these prac- 
tices supports the unitary concept of reading abil- 


ity, that, that reading ability is so generalized 
that a reader can obtain a commensurate degree 
of interpretation regardless of the content or pur- 
pose. 

Davis (10), Humber (21), Shores (28), and 
Swenson (30) have produced experimental e vi - 
dence at the secondary school level which re- 
futes this entity concept of reading. They found 
that (a) content and purpose dictate the skills to 
be employed in reading a particular selection, 
(b) ability to read in one content area does not 
ensure equal success in another, and(c) ability 
to interpret content literally does not guarantee 
commensurate ability in a higher level of inter- 
pretation. 

There appears to be need for experimental 
data at the elementary school level as well. 
This study is one attempt to provide information 
concerning certain skills used in reading sci- 
ence materials at that level. 


Limitations of the Study 


Experimental Design— This study was primar- 
ily a statistical analysis of reading com prehen- 
sion as exhibited by a representative sampling 
of fifth-grade children. It was also concerned 
with the validation of the experimental edition of 
a reading test in science. To provide additional 
data, a test of verba! intelligence anda stand- 
ardized reading test were administered. 

Statistical Design—The results of the tests 
administered were statistically analyzed by 
large sample procedures which included inter- 
correlation, chi-square, and point bi-serial cor- 
relation. The reading test in science was evalu- 
ated by several measures of item-analysis. 

The Population— Five hundred thirteen sub- 
jects were used in the study. 

1. School Background: All subjects were in 

the last month of the fifth grade during 
June 1949. They were members of unse- 
lected fifth-grade classes from four sub- 
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urban Philadeiphia and four urban Phila- 
delphia schools. Eighteen different clas- 
ses were represented in the final popula- 
tion, nine urban and nine suburban. 

Age: The chronological age range was 
from 10-0 years to 14-6 years, inclusive. 
Sex: Both sexes were represented in the 
study. 

Intelligence: The verbal intelligence quo- 
tient range for the population was from 57 
to 158, inclusive. 

Reading Grade: The standardized reading 
grade ranged from less than 2.5 through 
10.3. 

Final Population Criteria: Only subjects 
who completed all tests were included 
in the final population. Of the six hundred 
eleven children tested, five hundred thi r- 
teen had complete results. Two hundred 
sixty-nine of the final population fell with- 
in plus or minus one standard deviation of 
the mean on both the intelligence and the 
standardized reading tests. 


Tests Administered—To obtain a measure of 
reading comprehension on an untimed power test, 
The Gates Reading Survey, Grades 3 to10, Form 
I, Level of Comprehension (published by the Bur- 
eau of Publications, Teachers College, Col um- 
bia University), was administered. To appraise 
literal and critical reading interpretation of sci- 
ence material, the experimental edition of the 
Intermediate Reading Test: Science was used. 
To measure verbal intelligence, the Pintner 
General Ability Test, Verbal series, Intermed- 
iate, Form A (published by World Book Com- 
pany) was given. 


Terminology 


1. Literal reading is the ability to obtaina 
low-level type of interpretation by using only the 
information explicitly stated. 

2. Critical reading is the ability to obtaina 
level of interpretation higher than that needed 
for literal interpretation. For this study the fol- 
lowing critical reading skills were employed: 


a. Functional Vocabulary tests the reader’s 
background of experience in reference to 
a concept used in the selection. 

b. Semantic Variation of Vocabulary tests the 
reader's ability to identify a similar usage 
of a given word from the selection. 

c. Central Theme tests the ability to dis tin- 
guish the central topic of the selection 
from subordinate ones. 

d. Key Idea tests the ability to identify the key 
or most important, idea in the story. 

e. Inference tests the ability to draw a specif- 
ie conclusion from facts explicitly stated. 


f. Generalization tests the ability to identify 
a general conclusion or principle from in- 
formation implicitly stated. 

g- Problem Solving tests the ability to apply 
information from the selection to a prob- 

lematic situation. 

h. Association of Ideas tests the ability to see 
the relationship among ideas in a series. 

i. Analogy tests the ability to perceive rela- 
tionship between two pairs of ideas. 

j. Antecedent tests the ability to recognize 
the word or words to which a selected pro- 
noun refers. 

k. Sequence tests the ability to determine a 
time sequence. 

1. Extraneous Idea tests the ability to determ- 
ine relevancy of ideas to a particular se- 
lection. 

m. Following Directions tests the reader’s 
ability to evaluate information as aprelim- 
inary step to executing or rejecting print- 
ed instructions. 

n. Visualization tests the reader’s ability to 
interpret a graphic representation of an 
idea verbally presented in the selection. 


3. ‘‘Survey’’ Reading Comprehension isa 
measure of understanding based on the results 
of a reading test which uses content largely from 
the field of literature. For this study, the com- 
prehension section of the Gates Survey was used. 

4. “‘General’’ Reading Comprehension is used 
synonymously with “Survey” Reading Compre- 
hension in this study. 


A Review of Kindred Literature 


Few studies have been reported that are di- 
rectly relevant to a study of the reading compre- 
hension of fifth-grade children. Nevertheless, 
the conclusions from many other studies as well 
as the opinions of recognized authorities have 
contributed to the assumptions upon which this 
Study was based. Accordingly, they are includ- 
ed in this survey of kindred literature which is 
here summarized in terms of the major conclu- 
sions reported. 


Reading as a Thinking Process 


The following assumptions concerning read- 


ing as a thinking process appear to be supported 
by the literature reviewed (32, 33, 34, 5, 26): 


1. Reading and thinking are inseparable pro- 
cesses: There is no reading without think- 


ing. 

2. Critical thinking is that type of thought 
which utilizes the higher mental proces- 
ses. When critical thinking is used inthe 
reading situation, the process is called 


‘critical reading’’. 
Development and Appraisal of Critical Thinking 


A review of the research has tended to con- 
firm the following assumptions: 


1. Critical thinking has a number of compon- 
ents (14, 25). 

2. The elementary mechanism essential to 
critical thinking develops gradually and is 
usually present in the individual by the age 
of seven (6, 8,19). 

Children observe the same general patterns 
of thinking as adults but are limited in 

reaching an equal degree of ability by their 
lack of experience (18). 

Growth in certain components assumed to 

be inherent in critical thinking can be af- 
fected by instruction (12, 14, 25). 

Critical thinking abilities and those meas- 

ured by intelligence tests are not identical 

(14, 25). 

Certain critical thinking abilities can be 

measured reliably and validly by paper-and- 
pencil tests (13, 14, 25). 


Subjective Analysis of Reading Comprehension 
Skills 


Four major approaches were made subjective- 
ly to the study of reading comprehension skills: 


1. The skills used by good readers were ana- 
lyzed (3, 15). 

2. The errors and difficulties exhibited by 
poor readers were examined (20, 4). 

3. The reading task was structured (15, 23). 

4. The skills specific to each content area 
were determined (15, 23). 


The findings from these investigations re- 
vealed that: 


1. A knowledge of many skills and abilities is 
essential to successful reading. 

2. Reading skills have been organized interms 
of reader purpose and of depth of compre- 
hension. 

3. Reading skills and abilities needed in se v- 
eral content areas exhibit certain similar- 
ities but, on the whole, tend to be specific 
to each content area, 


Experimental Analysis of Reading Comprehen- 
sion 


The evidence yielded by the experimental 
studies of reading comprehension seem to sup- 
port the following major assumptions: 
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Reading is a composite of many compon- 
ents and skills, many of which can be iso- 
lated, identified, and manipulated to serve 
in specific reading situations (10, 14). 
Reading is essentially a thinking process 
and requires for effectiveness all the ele- 
ments needed in critical thinking (10, 14, 
26). 

The ability to read a passage with literal 
interpretation does not guarantee the abil- 
ity to interpret that selection critically (2, 
10, 11, 13, 35). 

The ability to read successfully in one con- 
tent area does not ensure equal success in 
another (21, 23, 28). 

Certain reading and thinking skills are 
responsive to training (14, 27). 

In order to obtain a valid measure of read- 
ing ability in a given content area, selec- 
tions from that field must be used (17, 30). 
The concept of ‘‘general reading ability’’ 
is not supported by scientific evidence (10, 
28). 


Test Construction 


A review of the literature on test construc- 
tion led to the following conclusions: 


1. Most standardized reading tests are de - 
signed to measure the literal rather than 
the critical interpretation of the printed 
material.(5, 10,13). 

To date, in the research on test construc- 
tion, reading and thinking have been treat- 
ed as separate entities (5,10). 

No standardized reading tests are now 
available at the elementary school level 
which measure children’s ability to think 
critically about printed materials (24). 
Test content should resemble as closely as 
possible the major features exhibited by 
the instructional materials of the content 
area being measured (17, 30). 

The structural characteristics of the test 
must be well controlled (7). 

The major objective factors which influ- 
ence the readability of printed materials 
include those of (a) familiarity of voc abu- 
lary, (b) average sentence length, and (c) 
complexity of sentence structure (9, 16). 


Summary of Procedure 


A summary of the procedure followed in this 
study is outlined below: 


1. Constructed the preliminary edition of The 
Intermediate Reading Test—Science in order 
to obtain a measurement of literal and crit- 
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ical reading achievement in science. 

. Appraised the reliability of this edition by 
administering it to 143 children of grades 
four, five, and six in a preliminary study. 
Used item analysis on the test results. 
Revised the test to serve the major purposes 
of the study. Thereafter referred to the re- 
vised edition as the experimental edition of 
The Intermediate Reading Test—Science. 
Administered the experimental edition to a 
different population consisting of 513 fifth- 
grade children. 

. Administered to the same population two 
standardized tests: 

a. Gates Reading Survey (Level of Compre- 
hension) to determine the ‘‘general’’ read- 
ing ability. 

. The Pintner General Ability Test (Verb- 
al Series) to obtain an index of verbal in- 
telligence. 

. Estimated the reliability of the experiment- 
al edition of The Intermediate Reading Test- 
Science by using the Kuder-Richardson 
formula with the data obtained on the total 
population. 

. Studied the reliability of each literal andcrit- 
ical reading test item by: 

a. Inspecting the responses of the total pop- 
ulation. 

b. Computing the Standard Errors of the Dif- 
ference Between Proportions with the 
scores of the ‘‘good’’ and the ‘‘poor’’ 
readers (the upper and lower 27% of the 
population). 

Estimating the Pearson Product-Moment 
correlations between achievement in lit- 
eral reading and in each critical reading 
skill, using the upper and lower 27% of 
the distribution. 

. Computed the Pearson Product-Moment in- 
tercorrelations among the test variables: 
literal reading achievement in science, crit- 
ical reading achievement in science, ‘‘gen- 
eral’’ reading achievement, and verbal intel- 
ligence quotients. 

Investigated the presence or absence of the 

relationship between achievement in literal 

reading and the passing and failing on each 
critical reading test item by employing the 

Chi-Square test of significance. 

Determined the Point-Biserial correlation 

between achievement on total scores of the 

literal reading section of the experimental 
edition and achievement on each critical 


reading skill. 
Summary of Results 


Problem I: Literal and Critical Reading Compre- 
ension 


1. The Pearson Product-Moment formula 


yielded a correlation of .67 + .02 between 
literal and critical reading comprehension 
in science. With intelligence held constant, 
the correlation was . 34. 


Problem II: “‘Intelligence’’ and Reading Compre- 
hension 


2. The Pearson Product-Moment formula 

yielded a correlation of .83 + .01 between 
the Pintner Intelligence Test (Verbal) and 
the Gates Reading Survey (Level of Com- 
prehension). 
The Pearson Product-Moment formula 
yielded a correlation of .75 + .02 between 
the Pintner Intelligence Test (Verbal) and 
the literal reading section of the Intermed- 
iate Reading Test—Science. 
The Pearson Product-Moment formula 
yielded a correlation of . 67 + .02 between 
the Pintner Intelligence Test (Verbal) and 
the critical reading section of the Inter- 
mediate Reading Test—Science. 


Problem III: ‘‘General’’ Reading Comprehension 
and Literal and Critical Reading C omprehen- 
sion in Science 


5. The Pearson Product-Moment formula 
yielded a correlation of .75 + .02 between 
the Gates Reading Survey (Level of Com- 
prehension) and the literal reading section 
of the Intermediate Reading Test-. “cience. 
With intelligence held constant, the corre- 
lation was .35. 

The Pearson Product-Moment formula 
yielded a correlation of .60 + .02 between 
the Gates Reading Survey (Level of Com- 
prehension) and the critical reading sec - 
tion of the Intermediate Reading Test—Sci- 
ence. With intelligence held constant, the 
correlation was .11. 


Problem IV: Literal Reading Achievement and 
Achievement in Each Critical Reading Skill 
(Science) 


7. The Point-Biserial formula yielded corre- 
lations ranging from -.15 to’ + .47 between 
achievement on the total literal reading 
section and performance on each critical 
reading test item. Table XIII presents the 
results. 


Conclusions 


Within the limitations stated, the following 
conclusions on each problem seem to be valid:- 


Problem I: 
1. There is a substantial relationship be- 
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tween literal and critical reading compre- 
hension in science. 


Problem II: 


2. There is a very high reiationship between 
verbal intelligence and ‘‘general’’ reading 
ability. 

There is a high relationship between verb- 
al intelligence and proficiency in literal 
reading in science. 

. There is a substantial relationship between 
verbal intelligence and proficiency in criti- 
cal reading in science. 


Problem II: 


5. There is a high relationship between ‘‘gen- 
eral’’ reading comprehension and literal 
reading comprehension of science materi- 
als. 

. There is a substantial relationship between 
‘*general’’ reading comprehension and crit- 
ical reading comprehension of science ma- 
terials. 


Problem IV: 


7. There is a very low or negligible relation- 
ship in science between proficiency in lit- 
eral reading and in each of the respective 
critical reading skills. 


The conclusions for this study may be sum- 
marized as follows: 


1. Critical reading comprehension in science is 
a complex of skills or abilities, each of which 
is relatively independent of the ability to read 
literally. 

Proficiency in critical reading of science ma- 
terials cannot be predicted from scores ob- 
tained (a) on literal reading tests in science, 
(b) on group tests of verbal intelligence, or 

(c) on ‘‘general’’ reading tests. 

. Proficiency in literal reading interpretation 
of science materials may be predicted with a 
fair degree of accuracy from scores on group 
tests of verbal intelligence and ‘“‘general’”’ 
reading tests. 

4. Group tests of verbal intelligence and ‘‘gener- 
al’’ reading tests tend to measure many com- 
mon abilities. 


Implications 


These four general conclusions seem to justi- 
fy the following implications for education: 


In planning an analysis program, both curri- 
culum workers and the personnel responsible for 


the testing program need to give serious thought 
to the limitations of a group test of verbal intel- 
ligence for measuring the capacity of retarded 
readers. Consideration also needs to be given 
to the inadequacy of either the standardized read- 
ing tests available at the elementary school level 
or the group tests of verbal intelligence for pre- 
dicting proficiency in critical reading compre- 
hension in science. This school personnel should 
realize that since critical reading compre hen- 
sion embraces relatively independent abilities, 
a valid diagnosis of that complex can be made 
only by measuring proficiency in each specific 
critical reading skill and by using materials of 
specialized content. Accordingly, a complete 
elementary school analysis program should in- 
clude, in addition to other tests, (1) an instru- 
ment for measuring the capacity of retarded 
readers, and (2) tests of critical reading com- 
prehension that would yield a measure of pr ofi- 
ciency in each specific critical reading skill in 
each given content area. 

There is a crucial need for a new type of in- 
strument to measure reading comprehension at 
the elementary school level. The proposed in- 
strument necessarily would include items de- 
signed to measure the relatively independent abil- 
ities inherent in critical reading comprehension 
of a particular content. The obtained scores 
could not be expressed as a composite score but 
probably could be presented in profile form. 
This would show the relative strength or weak- 
ness in each specific critical reading skill and, 
therefore, could serve as a guide in the prepar- 
ation of the instructional program in the various 
content areas. 

Classroom teachers should recognize that, 
since critical reading ability consists of relative- 
ly separate abilities, the best procedure for de- 
veloping critical reading proficiency is by pro- 
viding instruction in each specific skill. For op- 
timum results, this instruction needs to be sys- 
tematic and direct, e.g., in order to develop 
problem-solving skill in science, opportunities 
for solving problematic situations by using sci- 
ence content should be afforded. By improving 
ability in each specific critical reading skill, 
the general level of critical reading ability 
could be raised. 

Another implication from the study may be 
stated as a caution to classroom teachers. 
From the results obtained in this study it ap- 
pears obvious that this representative population 
of elementary school children tended to be low 
achievers in critical reading comprehension in 
science. This was equally true for children of 
superior intelligence as for those of low intelli- 
gence. It appears vital to urge the classroom 
teacher not to take the development of critical 
reading comprehension for granted as a concom- 
itant of normal or superior intelligence, but to 


realize that it is a skill that needs development. 
Consequently, the teacher should provide, for 
all children, systematic instruction in each of 
the critical reading skills needed for successful 
interpretation of each specialized content. 


Suggestions for Further Research 


A more exhaustive analysis of critical read- 
ing comprehension seems to be inorder. Among 
the problems that need investigation are the fol- 
lowing: 


1. Construction of instruments for diagnos- 
ing critical reading comprehension in the 
various content areas. 

2. Study of the relationship between literal 
and critical reading comprehension (a) at 
other grade levels, and (b) in other con- 
tent areas. 

Investigation of the effect of systematic, 
directed instruction in critical reading 
skills (a) at each elementary grade level, 

and (b) in each content area. 

A factorial analysis of critical reading 

comprehension. 
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CHILDREN’S PERCEPTIONS OF RELATION- 
SHIPS AMONG THEIR FAMILY 
AND FRIENDS 


IVAN N. MENSH and JOHN C. GLIDEWELL** 
St. Louis, Missouri 


Introduction 


UNTIL RECENTLY, the personal and social 
development of children has been studied princi- 
pally from the point of view of adults. It has be- 
come increasingly evident that not only are adult 
perceptions of child behavior important to our 
understanding, but there also isnecessary some 
insight into children’s perceptions of themselves, 
their peers and the adult world. A number of 
techniques have been developed to assess these 
perceptions, ranging from the clinical interview 
through play and other projective methods to 
more structured tests. For example, children 
in the age range of four to fourteen years have 
been studied variously by Del Solar (2), Griffiths 
(5), Herbst (6), Mott (9), and Rogers (10), by di- 
rect observation, interview, questionnaire and 
other test techniques. In these studies there 
have been reported children’s perceptions of 
a) childcare andcontrol (6); b) household, social 
and economic relationships and activities (6); 
c) behavior difficulties of children as they them- 
selves see them (5); and d) self and other rela- 
tionships (10). In one of these investigations, 
Griffiths studied 900 school children ranging in 
age from 6 to 14, ina research program de- 
signed to test three major hypotheses. These 
were: 


1. That young children of the early el ementary 
school level are aware mainly of those diffi- 
culties characterized by overt and aggressive 
behavior, but that withincreasing age they be- 
come increasingly aware of difficulties of a 
submissive or withdrawing nature. 


That as children grow older, parents’, teach- 
ers’, and children’s judgments of behavior 
difficulties are in greater agreement. 


That children from the middle social -econ- 
omic groups are essentially conformists; 

they show fewer overt and aggressive be hav- 
ior difficulties, but more difficulties of a sub- 


*All footnotes will be found at end of article. 


missive or withdrawing nature, than children 
from the upper and lower socio-economic 


groups (5). 


Data to test these hypotheses were gathered 
from parents and teachers by questionnaire, and 
from the children by interview. The behaviors 
analyzed were categorized as aggressive, delin- 
quent-related, withdrawing, or non-compliant. 
The latter applied only to the home situation, 
therefore teachers’ replies were not scored for 
this category. In general, the findings supported 
the hypotheses. 

In a similar study but with a much smaller 
sample (36 children ranging in age from 6 to 12), 
Del Solar (2) reported that chilaren perceive their 
teachers’ goals for the children as improvement 
in academic achievement and in classroom behav- 
ior. Inrelationtoparents, the children reported 
their recognition of 1) parental authority, 2) their 
own personal responsibility, and 3) the need for im- 
proved sibling relationships. Suchdata have sug- 
gested the significance of children’s perceptions 
as an area affording new insights into behavior. 


I. Subjects and Family Structure Study Procedure 


As part of a series of classroom observations 
in alarger project (3,4) andin studying children’s 
perceptions of family structure and interrelation- 
ships, the method of asking the children to list 
their family members, best boy friend, and best 
girl friend, and to rank their preferences among 
them was adopted as a technique for sampling 
childhood perceptions of the family constellation. 
During the same experimental session in which 
these data were obtained, the children also re- 
sponded to a series of sociometric choices (cf. 
Part II in this study) within their respective class- 
room units. The children were studied by a team 
of four observers in a regularly scheduled morn- 
ing session, during which the teacher was out of 
the classroom. At the time of observation the 
teachers were asked to rate the chilaren ona four- 
point scale of general adjustment. In addition, 
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there were available for study reports from 1) 
the school records, 2) the mental health service 
workers in contact with the mothers and with 
school personnel, and 3) in a few cases (N = 6), 


a Child Guidance Clinic. 


ly 


Also, at approximate- 
the time of the observations inthe classrooms, 


the mothers were interviewd in their homes by 
trained interviewers during a two-hour period. 


Thus, data from 1) direct observation of the 


children in the social structure of the school 
classroom, 2) their mothers in the home situa- 
tion, 3) the teachers, and 4) the mental health 
workers were gathered independently. The lat- 
ter two independent sources of data were used 


as 


dren’s behavior. 


in 


criteria against which to evaluate the chil- 
These raters were instructed 
the definitions of general adjustment level de- 


veloped from Ullmann’s earlier study (13). 


1. 


A child who is unusually well-adjusted in his 
relationships with others and in his accom- 
plishments. 


. A happy child who gets along well and accom- 


plishes reasonably well the things that usual- 
ly go with his age and level of development. 


A child who is not so happy as he might be; 
has moderate difficulties in getting on; grow- 
ing up presents something of a struggle. 


. Achild who now has, or at his present rate 


is likely sooner or later to have, serious 
problems of adjustment and nee-ts cr may 
need special help or care because of such 
problems. 


The sample under study consisted of third- 


grade children whose level of reading and writ- 
ing necessitated a minimum demand on these 


skills. 


For the family structure perceptions, 


the words for family members —‘“‘father’’, 
‘*mother’’, etc., —were printed on the black- 
board and copied by the children under the direc- 


tion of several observers. 


‘*Family’’ was de- 


fined as all children and adults living in the home. 
‘*Roomer’”’ and ‘‘boarder’’ also were included 
where necessary, aS were grandparents and 
other relatives who lived as members of the 
household. The family was listed according to 


age, including ‘‘me’’, for all subjects. 


At the 


bottom of the prepared form, two spaces were 
assigned, one for ‘‘best boyfriendin the room’’, 
and the second for ‘‘best girl friend in this 


room’’, 


Following this reporting of ‘‘fam ily 


and friends’’, each child individually reviewed 
his list with an observer to insure accuracy of 
report. 


The second step in this method of sampling 


childhood behavior consisted of each child rank- 
ordering his preferences for all family mem- 


bers and for his two best friends, rating the most 
preferred individuals as ‘‘1’’, the next as ‘‘2’’, 
etc. Again his reporting was individually checked 
by one of the several observers. There were oc- 
casional instances in which a girl or boy was re- 
luctant to indicate ‘‘best friend’’ of the opposite 
sex but this was overcome by quietly talking with 
the child to complete the ‘‘game’’. When complet- 
ed, the data obtained by this method included lists 
and preferences of family and friends for a total 
of 91 third-grade youngsters from three class- 
rooms, one from each of three adjacent public 
school districts. Although, in their rankings, 13 
of the 91 children had incorrectly followed in- 
structions, ten sets of ratings were corrected 
without difficulty and only three were unusable in 
the data analysis. 

The data permitted the testing of several hy- 
potheses which had been developed (3,4) in the 
larger project. In an earlier study, Rogers (10) 
had reported some of these hypotheses and tested 
them in the development of his ‘‘Test of Personal- 
ity Adjustment’’.! The first stated that degree of 
disturbance in the child as assessed by teacher 
and trained mental health worker was related to 
his perceptions of the family constellation. Thus, 
it was hypothesized that a child who rates one par- 
ent as ‘‘1’’ and the other as ‘‘3’? or more would 
show greater disturbance than one who rated the 
parents in ‘‘1-2’’ order. Similarly, children who 
preferred friends to sibs, and, also, children 
who rated those sibs just next to them in birth or- 
der as least preferred (sibling rivalry) among the 
sibs and friends would be rated as disturbed. 

Along another dimension of the present study, 
the hypothesis was formulated that size of family 
would be related to the degree of disturbance in 
the child. The analyses were designed also to 
test the hypothesis that sex of the child would dif- 
ferentially contribute to the family and friend pref- 
erences. Finally, as a result of aprevious study 
(1), it was hypothesized that the presence of grand- 
parents in the home would be related to disturb- 
ance in the child. There were one or both grand- 
parents in ten of the 88 family homes, perhaps 
constituting a subculture with psychological char- 
acteristics differentiating these homes from the 
other 78 homes. 

A scoring system was devised to include the 
variations of preferences within the family, based 
upon a total score of ten from which a point was 
subtracted for each of the following rank orders: 


Father and/or mother rated other than 1 
or 2; 

Friends preferred to grandparent(s); 
Friends preferred to parents and/or sibs; 
Other adults preferred to grandparents, 
parents and/or sibs; 

Sib next to subject least preferred. 
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The empirical scoring system just described 
may be contrasted with a similar one developed 
by Rogers. The children’s ratings were scored 
by both systems. Rogers’ scores constituted 
one of four subtests which contributed toa ‘‘fam- 
ily maladjustment score’’ in evaluating respon- 
ses to his ‘‘Test of Personality Adjustment. ’’ 
The test manual recommended the following 
scoring system.(10): 


If there are two or more sibs and one of the 
sibs next to the subject is given the highest 
number in the family, 1 point. 


If one of the friends is given a lower number 
than some member of the family, 2 points. 


If parents are separated by two ratings (e.g., 
mother rated ‘‘1’’, father rated ‘‘3’’), 2 points. 


If parents are separated by more than two rat- 
ings, 4 points. 


If parents receive highest number, 2 points. 


The data, scores obtained from the third- 
graders’ preferences among their respective 
family members and classroom friends, were 
treated by analyses of variances associated with 
the variables of school, sex and criterion group 
(ratings of general adjustment); and by chi 
Square analyses where frequencies constituted 
the data. 


Results 


The series of analyses of the children’s re- 
sponses tested their preferences for one or both 
friends over family members, one or more sibs 
over the parents, grandparents over parents, 
father over mother, and sib next to subject least 
preferred in the array. Among the 46 boys, 14 
listed preferences in a modal or ‘‘normal’’ or- 
der, as did 19 of the 42 girls. This order listed 
mother, father, sibs and friends, successively, 
and the sib next to the subject was not least pre- 
ferred. Thus, a modal pattern, following an 
a priori hypothesis about order of preference, 
was exhibited by more than a third of the sub- 
jects. Two other patterns appeared—father pre- 
ferred to mother, and sibling next to the subject 
in age rated as least preferred. Twelve chil- 
dren exhibited the former pattern and 15the sib- 
ling order. In seven other instances the chil- 
dren rated a sibling as more preferred thanone 
of the parents. Each of the remaining seven pat- 
terns were indicated only by 1-4children. Thus, 
67 of the 88 children rated their preferences in 
one of four patterns. 

Chi square analyses of frequency distribu- 
tions (using Yates’ correction for discontinuity, 


12) by criterion group, sex, and patterns of pref- 
erences, totalling 22 analyses for the various dis- 
tributions, yielded no significant probabilities for 
the chi square values. These values rangedfrom 
-00 to 2.60, none approximating the value re- 
quired for a significant probability (P .05 - .01 = 
3. 84 - 6.64). These findings indicate that there 
were not significant relationships, for either boys 
or girls, between preference patterns and general 
adjustment as rated by teachers and trained men- 
tal health workers. This was true also for the 
pattern of ‘‘normality’’ which did not occur signif- 
icantly more often among boys or girls, or among 
those rated as adjusted or poorly adjusted. 

The size of the family ranged from one child 
and his parents to nine children and their parents, 
with the mean number of children 2.6 per family. 
The households were distributed as follows: 


Family Constellation 


Grandfather with family 

Grandmother with family 

Both parents with family 

Other adults with family 

Only one child in family 

Parents and two or more 
children 


Analyses of the variances associated with size 
of family, sex of the third-graders, andcriterion 
group assignment (adjustment level) also did not 
produce significant probabilities, either for these 
effects or for interactions among the effects. The 
F-ratios varied from .87 to 2.26, values well be- 
low those required even at the .05 level of signif- 
icance. 

In treating the data on presence of grand- 
parents in the home and this effect upon prefer- 
ences and general adjustment, the incidence of 
grandparents in the home was low, as reported 
above, with about twelve percent of the children 
living in households with grandparents as mem- 
bers. As with the analyses just reported, there 
were not significant differences between prefer- 
ences and adjustments of children with grandpar- 
ents living in the family home and those without 
grandparents in the home. 


Discussion 


The series of hypotheses tested here had been 
derived from clinical experience and constituted 
the principal elements of the design. Except for 
the ‘‘modal’’ pattern, these hypotheses did not 
find support in the data obtained from the 91 third 
grade children in the present study. Not uncom- 
monly are clinical ‘‘hunches’’ and other exper- 
iences not supported in controlled studies. This 
has been demonstrated variously by Kelly and 
Fiske (7), Meehl (8), and Rogers et al (11). In 
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these assessment and prediction studies, follow- 
ups of the clinical interview, case history and 
test data did not successfully predict behavior in 
spite of the clinical traditions surrounding their 
use. It should be emphasized, however, that 
the investigators were aware of the yet-unsolved 
criterion problems. 

Another interpretation of the present findings 
suggests that these results may be a function of 
the developmental stage at which the data were 
obtained—the preadolescent age range of 8-10in 
which the family still dominates the personal 
and social worlds of the third-graders; and ex- 
tra-family group behavior, even in the social 
unit of the classroom, does not yet have greater 
influence. Also, the ordinal position of the child 
in the family hierarchy and the special charac- 
teristics of the homes with grandparents as part 
of the family units may be determinants in chil- 
dren’s preferences. Finally, the a priori, ex- 
pected pattern of intra-family preference did ap- 
pear to a significant degree, suggesting the po- 
tency of this cultural characteristic. 


ll. Sociometric Procedure 


During the classroom observations described 
earlier, the teacher was out of the room for the 
entire morning period. The classroom was se- 
lected as the social environmental unit and socio- 
metric choices restricted to this unit. These 
data were designed to yield information on both 
the number and quality of peer relationships, 
two of the dimensions of study. Each child was 
given six copies of a class seating chart on which 
all children were identified by name andnumber. 
The names were those used by the children in 
their usual interactions, e.g., ‘‘Larry’’, ‘‘Rus- 
ty’’, ‘‘Bobby’’, rather than the formal, given 
name; and the numbers were those worn for 
identification in a halter arrangement by the chil- 
dren throughout the morning. Six separate 
sheets were used in an attempt to keep as inde- 
pendent as possible the six sociometric choices 
desired [rom each child. The observers dis- 
couraged the few children who looked at earlier 
judgments, reminding them of the instructions 
given previously not to look back (at the sheets 
beneath the one on which they were working at 
the moment). 

The six sets of judgments asked of the chil- 
dren were ‘‘decide which three boys or girls in 
this room you most like’’, ‘‘most like to play 
with’’, ‘‘do not like’’, ‘‘do not like to play with’’, 
‘‘ask you to do things you yourself don’t want to 
do’’, and ‘‘do not ask you to do things you your- 
self don’t want to do’’. The latter two judgments 
were designed to get information about ‘‘demand- 
ingness’’, another of the dimensions considered 
significant in child behavior, as evolved in the 
research program (3,4) from conferences with 


the mental health service workers. The instruc- 
tions given the chilaren follow: 


Please look at the paper infront of you. 
You see that the name of every boy and girl 

in this room is on the sheet, andthe number 
is next to the name. Now decide which 
three boys or girls in this room you most 

like. (Pause) Put a ‘‘1’’ by each of their 
numbers (demonstrated on blackboard). 
(Pause) Be sure there are three 1’s on the 
sheet, and that these 1’s are next to the 

numbers of the boys or girls you most like 

in this room. Now put a second “1” (dem- 
onstrated) by the number of the boy or girl 
you like most of all in this room. (Observ- 
ers check responses to see that each child 
has three 1’s, one of which is double.) Now 
put the sheet under the others on your desk. 


Results 


A number of statistical problems arose in treat- 
ing the data, from the disproportional distribution 
of girls and boys, numbering 44 and 47, respec- 
tively, over the four criterion groups (N’s of 22, 
40, 19 and 10), the varying classroom sizes (N’s 
of 35,33 and 23), and the varying restrictions of 
choices among the children. In meeting the first 
problem, a correction for disproportionality (12) 
statistically controlled the differences in N’s over 
the criterion groups; in the latter analysis, re- 
striction of choices was treated as a variable of 
significance in the sociometric designs, i.e., it 
was hypothesized that a child whorestricted his 
choices to a single classmate differs psychologi- 
cally from another who nominates three different 
peers in his choices. 

In order to handle the second problem, a scor- 
ing system was devised to account for the varying 
numbers of choices. This latter was taken from 
the following nomograph: 


Number of Positive Choices 
0-1 2-3 4-5 6-21 


0-1 4 5 6 7 
Number of 


2-3 3 4 


negative 
4-5 2 3 


choices 
6-14 1 2 


It can be seen that a score of 7 was assigned 
individuals c hosen positively six or more times 
and withno, ornomore than one, negative choice; 
and ascore of 1 was givenachild who was chosen 
negatively six or more times and who had received 
no, or no more than one, positive choice. The 
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coded scores distributed themselves as follows: 


Score 1 2 3 4 5 6 7 
N 8 9 11 30 16 10 7 


Analyses of distributions of scores by sex, 
school and criterion group were made, with the 
results indicating significant differences (P = .01) 
among the latter, but not for the sex and school 
variables. The mean scores of the criterion 
groups ranged from 5.0 to 3.2—5.0, 4.2, 3.2 
and 3.2 for groups 1 (best-adjusted, acc ording 
to teacher and mental health service worker rat- 
ings) to 4 (in need of or receiving psychiatric 
help), respectively. Mean scores of the boys 
varied from 4.5 to 3.0, and for the girls from 
5.1 to 3.1, but the differences between the sexes 
(boys’ mean, 3.8; girls’ mean, 4.3) were not 
significant. Mean scores among the three 
schools did not vary significantly (4.03, 4.06 
and 4.04), nor did means of the various restric- 
tion groups, termed ‘‘restriction of interperson- 
al relationships’’. The latter consisted of three 
categories of children—in the first were those 
children who selected a different peer for each 
of the sociometric choices, in the second were 
those who selected the same child for two of the 
three choices (either positive or negative), and 
in the third were those who nominated the same 
peer for each of the three choices, i.e., a child 
restricted his interpersonal relations toa single 
classmate in his positive choices andtoa second 
classmate in his negative choices. The means 
of these three groups were 4.0, 4.1 and 4.1, re- 
spectively. Although these mean sociometric 
scores did not vary significantly, interaction ef- 
fects of school and IPR (the ‘‘interpersonal re- 
striction’’ manifested by the children in select- 
ing sociometric choices) were significant, indi- 
cating that the association between IPR and 
school varied from school to school. Thus, in 
Schools 1 and 2, children who selected a differ- 
ent peer for each sociometric choice were 
chosen positively more often than in School 3; 
but in School 3 the mean sociometric was highest 
(5.0, contrasted with 2.25 and 3.80 for Schools 
1 and 2, respectively) for those children who re- 
stricted their choices. 

The teachers’ ratings of general adjustment 
were consistent with the sociometric choices of 
the pupils, with the latter’s choices ranging in 
scores from 4.2 to 5.1 for the students rated 1 
or 2, and 3.0-3.3 for the students rated 3 or 4 
by the teachers. It should be remembered that 
the ratings and the sociometric scores were ob- 
tained independently, although, of course, both 
teachers and pupils may have based some or all 
of their judgments on thesame behaviors. These 
data then indicate that, with teachers’ ratings 
as independent criteria, students’ soc iometric 
choices as early as the third grade of school sig- 


nificantly differentiate the levels of adjustment of 
the pupils. And this discriminationoccurs on the 
basis of student choices related to ‘‘like’’, ‘‘play 

with’’, and demandingness, without reference to 
the adjustment dimension along which teachers 
rated the children. 

In another analysis, by the chi square method, 
the frequency with which boys chose other boys or 
chose girls, and vice versa, was examined. For 
example, did the boys and girls tend to pick the 
same or opposite sex classmates for their posi- 
tive and negative choices? The chi square proba- 
bility values (P) varied by school and sociometric 
variable, as shown below. (NS indicates P value 
of obtained chi square was less than . 05.) 


School 
Sociometric Area 1 2 3 T 


Like most NS .05 NS 


Like most to 

play with -01 ° .05 -01 
Least demanding NS NS NS 
Like least . 05 .05 -01 
Like least to 

play with -01 -01 -05 
Most demanding NS NS NS 


These data predominantly show significant 
sex discriminations only in the play area, i.e., 
boys and girls chose their same sex significant- 
ly more often than they chose the opposite sex. 
In contrast, on the ‘‘demandingness’’ dimension, 
there were no significant discriminations, with 
both boys and girls choosing their same or the op- 
posite sex in roughly similar proportions. In the 
“‘liking’’ area, only three of the six chi squares 
had significant P-values, with the children of 
School 2 not discriminating on the basis of sex, 
unlike the School 3 boys and girls. With the one 
exception in the area of ‘‘like most to play with’’, 
School 2 children did not select the same sex in 
their choices any more often than they chose 
classmates of the opposite sex, unlike the stu- 
dents of Schools 1 and 3. In general, the data in- 
dicate that sex differences are sharply drawn for 
play but are less distinct for ‘‘liking’’, and do not 
seem to operate in interactions in which these 
third-graders perceive classmates as ‘‘de mand- 


ing’’. 
Discussion 


The present experiment has resulted in a sys- 
tem of scores which summarizes both the num- 
ber and quality (positive or negative) of socio- 
metric choices. These scores distributed them- 
selves normally, and significantly discriminated 
the four criterion groups. Thus, criterion 
ratings from trained mental health worker and 
classroom teacher, and student choices independ- 
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ently made, were significantly related. Distri- Another set of observations consisted of a ser- 
butions of scores between the sexes and among ies of sociometric choices—‘‘like most’’, ‘‘like to 
the schools showed no significant variations play with’’, ‘‘not demanding’’, “‘like least’’, ‘‘do 
along these two dimensions. not like to play with’’, and ‘‘demanding’’. These 
The hypothesis about the relationship of gen- data are reported in terms of the number and qual- 
eral adjustment, as rated by teachers and men- ity of peer relationships, their relationship to rat- 
tal health workers, to the degree with which the ings of general adjustment, and sex and school 
subjects restricted their sociometric choices, differences. To enable the treatment of data 
was not supported by the data. The analysis was gathereafrom the children’s sociometric choices, 
desigired to test the idea that the more disturbed a scoring system has been devised to provide an 
childrs2, as judged by adults, make fewer rela- index of both number and direction (positive or 
tionships than those students who have been rat- negative) of choice. It was found that these socio- 
ed as better adjusted. The significant interac- metric scores significantly differentiated the four 
tion effects between school and IPR (interperson- levels of mental health used as criterion meas- 
al restriction) do, however, indicate that the re- ures. Further, it is significant that there is high 
lationship between these factors varies from correspondence between the criterion ratings by 
school to school. There was, further, support trained mental health workers and by classroom 
for the hypothesis that IPR is related, in peer teacher, and the independently made pupil socio- 
judgments, to adjustment, for the data dem on- metric choices. Thus, these data indicate that, 
strate that children who had but few interactions with ratings by trained adult raters as independ- 
in the classroom were less frequently positively ent criteria, students’ sociometric choices as 
chosen and more often negatively chosen than early as the third grade of school significantly dif- 
those whose range of sociometric choice indicat- ferentiate the four levels of psychological adjust- 
ed greater adjustment in the social situation. ment of school children here specified. Finally, 
Sex differences were consistently observed, children with but few classroom interactions are 
with but one exception, only in the dimension of more often negatively chosen and less often posi- 
play; and just the opposite obtained in the dimen- tively chosen by their classmates, suggesting that 
sion of demandingness where boys and girls did the degree of interaction in the class may be one 
not discriminate sex in their choices, whether of the socio-psychological dimensions along which 


positive or negative, and no more often chose the children range their evaluations of class- 
same or the opposite sex. Again, there were mates. The data summarized here also indicate 
differences among the schools, differences which significant variations in these interactions from 

have been found consistently with respect to school to school. 

other measures in the overall evaluation pro- 
gram. 
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1. This test was developed by Rogers ‘‘while 


on a fellowship at the Institute for ChildGuid- 
ance, New York City. The subjects used 
were ‘problem’ children referred for inten- 
sive study and treatment. Briefly, the meth- 
od was as follows: Detailed ratings on each 
child were obtained from clinic workers— 
psychiatrists, psychologists, social workers 
—who had intimate knowledge of and contact 
with the child. These ratings were thencom- 
pared with the child’s responses on the test. 
It was found that children with a poor group 
adjustment—those who felt inferior socially 
—tended to give certain respnses. Day- 
dreaming children tended to give other re- 
sponses. And so on with other types. From 
these typical responses it was possible to 
build up a scoring system which applied to 
other children.’’(10) 
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